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SOME INTERESTING FEATURES OF FREQUENCY CURVES 

By Richmond T. Zoch 

Introduction 

It is well known that in the normal error curve the points of inflection arc 
equidistant from the mode. However it has never been pointed out that this is 
also a characteristic of all of the bell-shaped Pearson Frequency Curves. This 
fact can be most easily shown by placing the mode at the abscissa r = 0. 

Many rough checks have b('en developed for use in api)lying the Theory of 
Least Squares. The second part of this paper develops a rough check on the 
computation for use when fitting a Pearson Frequency Curve to a set of observa- 
tions. No rough checks on computation are given in textbooks on Pearson\s 
Frequency Curves. 

At present it is customary to follow a separate proctxiure for each Type of 
curve when computing the constants of a Pc'arson Frequency Curve. The 
third part of this paper shows how a single system may be followed for all Types. 
A single procedure is \Try desirable in order that the rough check of Part 2 may 
be quickly applied. 


Part 1. Points of Inflection 

Perhaps nothing brings out the limitations of the bell-shaped Pearson Curves 
in a more striking manner than a discussion of their points of inflection. In 
dealing with frequency curves it is well known that any curve can be fitted to a 
given distribution and that the real problem in curve fitting is the selection of a 
curve. Figures 1, 2, and 3 illustrate three hypothetical histograms. ^All three 
of these histograms are bell-shaped yet none of them will be closely fitted by 
any of the Pearson Curves. The reasons will be pointed out presently. 

The differtiiitial equation from which Pearson derived his system of frequency 
curves is 


dy __ y{x - P) 

dx -1- bix -f- bo ‘ 


By putting x — P = X, i.e. by placing the mode at the abscissa X = 0, this 
differential equation may be written: 


dy yX 

dX^ dt B2X db PiX + Bo 

where the -|- or — sign is taken according to the type of the curve. (It will be 
shown later that the constant term of the denominator must be less than zero.) 
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Since in the Type III curve is 0 and in the ^'Normal Ciirve^^ both B 2 and Bi 
are 0 it will be advantageous to consider the general case of 


dy __ yX 

^ “ V(X) ’ 


whore F(X) is an integral rational function of the degree, at once rather than 
considering special cases first. 


If 


dy _ yj^ 
dX F(X) ’ 

then 


d^y ^ jj 

dX^ [Fm 


{X^ + F(X) 


- XF'{X)} . 



In order Jo locate the points of inflection, 


dX^ 


is t^qiiated to zero. 


Then we have: 


+ F(X) - AT'(X) = 0. (1) 

This equation is always of the same degree as F(X) except when F{X) is linear or 
constant. Hence we have proved the Theorem: If y = G(X) be the solution 
of the differential equation 

dy __ yX 
dX ~ F(X) ’ 

then the number of points of inflection of y cannot exceed the degree of F{X) 
when F(X) is of degree greater than one. 

Now F(X) = BnX- + Bn-iX-~^ + • • • + BiX^- + BiX + Bo. Whence 
equation (1) can be written in the form: 

(1 ~ n)BnX- + + (3 ~ n)Bn-2X--^ + . . . 

+ (r + . . . - SB^X^ - 2B,X^ + (1 - B 2 ) X^ + Bo == 0 . 
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HoiiC(‘ we have establisluHl the Theorem: The coefficient of the linear term of 
A" in the equation of tlie points of inflection is zero. 




For the “Normal Curve and also for Type III, 

B 2 = Bz = B\ = . . . = Bn = 0 . 

Hence the points of inflection of these two Types are given by A' = zky/ —Bo. 
For Types I and II, B 2 is positive and Bz = Ba = • • • = Bn = 0, and the 
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points of inflection are X 




B2 


Hence the points of inflection are 


undefined if i ^2 = 1, are pure imaginary if J92 > 1, and real if B 2 < 1. 

For Types IV, V, VI and VII, B 2 is negative and B 3 = • • ^ = 0, and the 


points of inflection are at X 




4/ i+N' 


In some of these Types it may happen that the abscissae of the points of 
inflection though real will lie beyond the range of the curve. Thus Types III 
and VI may have 1 or 2 points of inflection, the single point of inflection occur- 
ring when \/ . > the range of the curve in the din'ction that the range is 

y 1 + ^2 

limited. Type II may have 0 or 2 points of inflection, as there will be no real 
points of inflection when B 2 ^ 1. Type I may have 0, 1 or 2 points of inflc'ction. 
Types IV, V and VII as well as the “Normal Curve"' always have 2 and only 
two points of inflect tion. 

Now it should be noted that when one of the eight bell-shaped Pearson curves 
has two points of inflection then the abscissae of these 2 points of inflection are 
equidistant from the abscissa of the mode. In figure 1 a point of inflection will 
be at abscissa b and another at abscissa a, {M is the abscissa of the mode.) 
Since b — M 7 ^ M — a none of the Pearson curves will fit this histogram closely. 
In figure 2, points of inflection occur at abscissae a, 6, and c. Since a Pearson 
curve can have at most two points of inflection no Pearson curve will fit this 
histogram closely. In figure 3 there are four points of inflection and no Pearson 
curve will fit this histogram closely. 


Part 2. Range 

Definition: A bell-shaped curve is a continuous curve which starts at zero 
(or zero as a limit), rises to a single maximum, at which maximum point the 
first derivative is zero, and then falls to zero (or zero as a limit). 

Or, more formally, y = G{x) is a bell-shaped curve if G{xi) = G{x 2 ) = 0 and 
if G'(P) = 0 and G"{P) < 0 where G{x) is continuous and does not vanish in the 
interval from xi to X 2 and P is a unique point in this interval. 

If a bell-shaped curve has the value of zero at two finite points, one on each 
side of the maximum (mode), it is said to be of limited range in both directions, 
or briefly, of limited range. 

If a bell-shaped curve has the value of zero at only one finite point it is said 
to be of limited range in one direction, or also of unlimited range in one direction. 

If a bell-shaped curve has the value of zero only at db <» , i.e. at no finite points, 
it is said to be of unlimited range in both directions, or briefly, of unlimited range. 

Theorem I: If F{x) can be separated into a finite number of factors each 
either of the form (x — r*) or (x^ 2r, x + rj -|- rl^) where no real root is 

repeated and y = G{x) is a bell-shaped curve which is a solution of the differential 
equation 
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dy _ y{x - P) 
dx F(x) ^ 

then if F(x) has no real roots, y is of niilimited range in both directions; if all of 
the real roots of F(x) lie on the same side of P, y is of limited range in one (that) 
direction; if at least one real root of F(x) lies on one side of P and at least one on 
the other side, y is of limited range in both directions. 

Proof: If F{x) = 0 when x = P, we have 

^ JL 

dx g(x) 

where g(x) = F{x) {x — P). This derivative is zero only when ^ = 0 or 
g(x) = ±00. Hence the solution does not have a finite maximum and therefore 
is not a bell-shaped curve. If F{x) > 0 when x = P, we have 


(Py 

dx^ 


^ y 


[Fix)] 


wliich is greater than z(‘ro and, since at a maximum the second derivative must 
not be greater than zero, in this case* the solution would have a minimum at 
X = P and therefore would not be a bell-shaped curve. As the theorem concerns 
only those solutions which are bell-shaped curves, F{x) < 0 when x = P. If 

F(x) = 0 when x 9 ^ P then ^ = ± 00 unless y is also zero. Assume ?/ 5*^ 0. 

ax 

Since F{x) is negative', if y 9 ^ 0 when F{x) = 0 then ^ ^ as F(x) — > 0, 

ax 

for an j > P, and changes to + ^ as F(x) changes sign on passing through the 
walue 0. He'iice the curve would contain another maximum before falling to 
zero and therefore the solution is not a bell-shaped curve. Similar reasoning 
liolds for an X < P. Therefore if y 9 ^ 0 when F(x) = 0, the curve is not bell- 
shaped. If = 0 when F(x) = 0, the curve has its range limited at this point. 
Tliat is, any real number which makes F(x) vanish will also make y vanish if y 
represents a bell-shaped curve. Henc(‘ if all of the real roots lie on the same side 
of P the curve is of limited range in that direction only, while if at least one of 
the real roots lies on each side of P the curve is of limited range in both direc- 
tions. If F{x) contains no real roots it does not vanish for any real value of x. 
In this case, by partial fractions the differential equation becomes: 


y 


/c,i dx 

(x + r,)’' + rl^ 


+ 


fcj] dx 

+ rl^ 


+ ■•• + 


2 k 2 j {x + r,) dx 
(x: + + ro. 


2fcjj(x + r^) dx 
(x -}- r^y + foj 
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On integrating, 


y = C[{x r,)- + roi]‘“‘ [(x + r,)' + /-Jj] 


Ar22 


^11 arc can 

e 


Hence y does not vanish for a finite real value of x and the Theorem is fully 
established. 

Theorem II: If F{x) can be separated into a finite number of factors each 
either of the from {x — r ») or {x^ + 2 rjX + where no real root is repeated 

and y = (jr(x) is a bell-shaped curve which is a solution of the differential equation 
dy ^ y{x - P) 
dx F{x) 

is of limited range in one din'ction, all of the real roots of F(x) lie on the same 
(that) side of P; if y is of limited range in both dire(*tions, at least one of the real 
roots of F{x) lies on one side of P and at least one on the other. 

Proof; By partial fractions the differential equation may be written: 


, then if y is of unlimited range, F{x) contains no real roots; if y 


dx 
X - Tu 


_j_ 

X — rj2 


I A.*9i dx 

+ ^*21)*^ + ^’Oi 


, 1^2 dx 

+ + 'rl. 


2 k^i{x -f" ^21) dx ^ 2h^2^x -f- ^23) dx ^ 
(*^* + ^’21)*^ + ^’oi (-^ + ^’22)*^ ^02 


and on integrating: 


y = C{x ~ . . . [(x -1- J 


^31 


^21 JTP tan 

. . e '■oi 


Hence // = 0 for x = ru, v\< 


and for no other finite values of x provided fcn, 


^’l2, • • • are positive. If one or more of the kij are negative, 7/ = such 

points and unless some r*; elos(T to P has previously made y vanish, the curve 
is not bell-shaix'd. Therefore, for bell-shaped curves, the exponent of the factor 
containing the real root of smallest absolute value on each side of P is positive. 
Therefore: if y is of limited range in both directions, at least one real root lies on 
each side of P; if // is of unlimited range in one direction, all of the real roots lie 
on the same side of P; if y is of unlimited range it contains no real roots. Hence 
the Theorem is established. 

The effect of repeated real roots will now be considered. If a real root is 
repeated an odd number of times at x = r, then F(x) changes sign at x = r 
and the first theorem is true. If a real root is repeated an even number of times 
at X = r, then P(x) does not change sign at x = r and we know that either (a) 

dy 

7/ = 0 at X = r; or (b) y is finite and 9 ^ 0 and ^ = dz 00 at x = r, i.e. there is a 
point of inflection at x = r. It will now be shown that (b) cannot occur. If 
case (b) is possible, y is continuous at x = r, ^ = it 00 according as (r — P) ^ 0 



SOME INTERESTING FEATURES OF FREQUENCY CURVES 


7 


moreover 


dx 


does not change sign in the neighborhood of the point a* = r, and 


“ changes sign from + oc to — 
dx^ 

Now 


oc or vice versa according as (r — P) $ 0. 


d^ij 

Whence if y is finite and ^ 0, . - does not change sign at x = r because it is 

dx^ 

possible to select a neighborhood such that 


{x 


py I > F{x) -{x- F) -y- Fix) 


for an x differing from r by e where c is a small i)ositive quantity. Therefore 
case (b) is not possible and y = 0 when a real root is repeated an even number of 
times. That is to say the range of the curve is limited at a point where a real 
root is repeated an even number of times. 11108 Tlieorem I always holds for 
repeated roots. 

For Theorem II it is clear that this Theorem holds for repeated roots when a 
non-repeated root lies closer to P, and on the same side, than the repeated root. 
Suppose that the repeated root is the nearest root to P (on a given side of P). 
Then by partial fractions: 

dy _ fcii dx ^’J 2 dx dx dx k ^2 ^x 

y (x — rii) (x — rii)'- (x — rii)'* (x - r4i) (x — r42) 

I ..I ^'21 , h i ^ J- 4. 

{x -f- »’ 2 i)^ + (x + + Toj {x + Tj])* + Toj 

and on integrating: 

y = C{x - rnY’Kx - r4i)*'"(-c - • • • [(x + r^y + roj]*” 

x + r 

*21 art- tu ' / V 

...g ^0, 2(x-rii)2 


Hence y can = 0 only for x = rn or for x = r 4 i, r 42 , • • • and for no other finite 
values of x. Since by hypothesis y is bell-shaped, then the proper k,j must be 
positive and Theorem II always holds for repeated roots. 

Theorems I and II can now be combined and generalized in the form: 

Theorem: If F{x) is a polynomial with real coefficients and y = G(x) is a 
bell-shaped curve which is a solution of the differential equation 

dy y(x - P) 
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then t}ie necessary and sufficient condition : that y be of unlimited range in both 
directions is that F(x) have no real roots; that y be of limited range in one direc- 
tion is that all of the real roots of F{x) lie on the same side of P; that y be of 
limited range in both directions is that at least one real root of F{x) lie on one 
side of P and one on the other. 

Corollary : F{x) must be negative throughout the range of y. 

Suppose now that we have some statistics which we wish to graduate and the 
statistics arc of such nature that wc^ would expect a bell-shaped curve, rather 
than a J- or U-shaped curve, and we desire the best fit: If we use a curve which 
is a solution of the differential equation 

^ -P) 

dx ' P(.r) 

(the Pearson Curves being special cases) to fit the statistics and if in computing 
the constants for the curve one of the following cases arise: 

(a) 6o < 0 when this constant is computed, 
or (b) Po < 0 when the origin is moved to the mode, 
or (c) a root is located within the range of the statistics then it means that : 

1. A mistake may have been made in the computation: thus the Theorem 
just established provides a rough check on the work of computation, 

2. If no mistake has been made in the computation it may indicate that the 
bell-shaped Pc^arson Curves will not closely fit the statistics and that some 
other graduation curves be used, e.g. the Gram-Charli(»r Types A or B might be 
tried, 

3. If no mistake has been made in the computation it may happen that one 
of the bell-shaped Pearson Curves will give an excellent fit but a different method 
than or a modification of the M(»thod or Moments should be used in order to 
compute the constants. 

Part 3. Computing the Constants 

At present, the constants of a frecpiency curve are computed as follows: 
First the moments are computed about an arbitrary origin, then the moments 
about the A.M. arc determined, then and ft and the criterion are computed, 
after which the type of curve can b(* selected. From this point a separate 
procedure is followed for each curve. Now in the above method one will not 
know whether a root has been located in the range of statistics or not. 

Take Pearson^s differential equation 

^ - P) 

dx 4- hiX + 6o ' 

Put A = X — P. Then dX = dx and x = + P) 

^ ^ yX - ^ yX 

dx P')^ -j- bi{X ”1“ P') -|- bo fcaX® -|- ^Pb 2 X -|- biX P^b 2 “i~ Pbi -f- bo 
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Now put 


62 = -62 
2P62 + 61 = 

P2^2 + Phi + 5o = ^0 . 


Then we have 

dy _ yX 

dX B2X2 + BiX + Bo 


^ ^ y(x - P) 

i?2(:r — Py + Bi{x — P) + Bo ’ 


It should be noted that for a particular curve, B2, Bi and Bo are constants; 
i.c., their values do not change with a change of the origin. The values of hi 
and bo do change with a change' in the origin. 

If we clear equation ( 1 ) of fractions, multiply by and integrate with respect 
to X over the range from Xi to 2*2, where 


e^i’ + - 2 -r 


Jxj 


ydx , 


then successively differentiate with respect to >7, and equate eoeffieif'nts of 
like powers of rj, we finally obtain : 

^ p Bi ^ 2PB2 + 2B2X1 = 0 , 

X2 -f- Bo — PBi P^B^ ~f“ BjXi — 2 PB^\ -j- t^B ^2 “h B2X1 = 0 , 

( 2 ) 

X3 -j- 2X2B1 — 4PB2X2 -|- 4B2X3 -f- 4B2X1X2 == 0 , 

X4 -j“ 3 BiX 3 — 6PB2X3 4” 5B2X4 -j" 6B2X2 4" 6B2X1X3 = 0. 

Since we can compute the moments from the raw statistics and the semi- 
invariants from the moments, we may regard X2, X3 and X4 in these equations as 
knowns and the Bo, Bi, B2, P and Xi as unknowns. But the origin has not yet 
been specified. I^et the origin be placed at the A.M. where /xi = Xi = 0 . As 
X2, X3, X4, Bo, Bi and B2 are unchanged by a change of origin, we have: 

By — Po — 2 P 0B2 = 0 . 

X2 4 " -^0 — PqP\ 4" P\P 2 4" 3B2X2 = 0 , 

0 ) 

X3 4 " 2B1X2 — 4 P 0B2X2 4 “ 4B2X3 == 0 , 

X4 + 3B1X3 - 6P0B2X3 + 5B2X4 4- 6 B 2 X^ = 0 . 


Now put 


60 = Bo — PqBi 4 " Po-®2) 

by = By — 2P0B2 , 

h 2 ^ B 2 \ 
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then 


bi — Pq = 0 j 

^2 + ^0 ~h 362X2 = 0 , 

Xa + 2 b[\i + 4fojXs = 0, 

X4 + 36^X3 + 562X4 + 662X1 = 0. 


By reversing the tninsforrnation (4) we get: 

B 2 = 6 ' , 

B, = b[ + 2P06.2 

Bo = 60 + Po(6; + P06') . ^ 


( 6 ) 


Now the above theory suggests the following procedure for computing the 
constants of a frequency curve: First the moments are computed about an 
arbitrary origin, thcMi the semi-invariants are computed (or alternatively the 
momenits about the A.M., either stop involves about the same amount of work), 
then the (equations (5) are solved and then by means of equations (6) the B 2 , 
Bx and Bo nrc' computed. Next solve the quadratic equation 


B2X2 BxX + Bo = 0 . 


The character of the roots of this equation indicates which type to use and it is 
unnecessary to compute the criterion. The constants of the frequency cairve 
are simple functions of the roots of the above quadratic equation and can be 
readily found by integrating the diff. eq. (1) being careful to write the solution 
as a function of X = x — P. The rough checks mentioned in Part 2 can be 
quickly and conveniently applied when this procedure is followed. 

Geouqe Washington University. 



A RECONSIDERATION OF SHEPPARD’S CORRECTIONS 

By W. T. Lewis^ 

In computing the moments of a frequency distribution it is (customary to find 
first what are known as the raw moments. These are obtained on the assump- 
tion that all the material of each class interval is concentrated at the middle 
point of the interval. It introduces what is called a grouping error because in 
fact the material does not all lie at the middle point. To compensate for this 
error W. F. Sheppard^ derived a set of corrections. The hypothesis underlying 
his method is that the distribution may be regarded as similar to one to which 
the Eulcr-MacLaurin summation formula without its end terms may be applied. 
He presupposed such a curve, found its true moments, and then the raw moments 
that would be obtained if its area were concentrated at several equidistant 
abscissae. The relationship between these raw moments and the true moments 
of the curve furnished him with the corrections required for that distribution. 
If now our observed distribution may be supposed to be sufficiently like that one, 
we may use his corrections also on the observed data. One may note four points 
of criticism. 

(1) The given distribution may not be similar to the one suggested, in the 
sense that it would be close to such a curve if the intervals of grouping were 
made very small; or at all events the purpose of finding the moments may be in 
])art to decide whether or not it would become such a curve, and so one would 
not like to assume that to be true at the outset. A special case of importance 
in which this last is true occurs when one is finding the moments of a sample in 
order to determine whether it may have been drawn from a presupposed universe. 
It is inexact to use raw moments but it is illogical to use corrections that have 
been proved only for the universe being tested. 

(2) Sheppard's argument does not make use of the one certain fact that is 
given in the hypothesis, viz; that the partial area of the given distribution over 
each class interval is exactly as stated. In fact, if, following the argument of 
some authors, the given curve be assumed to be exponential, it obviously cannot 
have partial areas everywhere exactly equal to the several given frequencies, 
for in particular its partial area is not zero beyond the given range. 

(3) It is common to find distributions which do not have high contact at the 
ends of the range and for them Sheppard's corrections certainly fail. To 
obviate this criticism new corrections have been derived by Pairman and Pear- 

* With the assistance of Burton H. Camp. 

* The true values are given on page 220 of “Mathematical Part of Elementary Statistics, 
by Camp, D. C. Heath and Company, 1931. 
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son for the so-called abmpt cases. These new corrections are adequate to care 
for the abrupt cases but involve so much computation that it is a fair question 
whether it would not be simpler, first to distribute the given material over each 
interval by a smoothing process, and then to find without corrections the 
moments of the smoothed distribution. 

(4) Even if one admits Sheppard's method in general, waiving the dubious 
question as to whether it is proper to start with an assumed curve instead of 
starting with the given distribution, it is doubtful whether there arc any curves 
which have exactly the properties required. The high contact hypothesis may 
be put in different language as follows: using the notation of the Handbook® 
page 92, let f(x) be the curve and Xi be the middle point of the slice. It is 
assumed that 

^ = f xj^'’(x) dx; t ■ = 0, 1, • ■ ■ ; r = 0, 1, • • ■ ; 

, y-o 

c being the class interval. This means that if the moments of the curve be 
found by using rnid-ordinates times class interval, instead of areas, one will obtain 
exactly the true moments of the curve, and that this will remain true for all the 
curves which arc derivatives of this curve. This property is certainly not true 
of the normal curve; but it is almost tnie when r and the class interval are both 
small, and it is probably due to this fact that Sheppard's corrections seem to be 
good in practice. 

Moreover, this high contact hypothesis cannot be tnie for any fumition over a 
limit(?d range if the function is developable in Taylor's series about one end of the 
range. For the only function which has the recpiired properties is identically 
zero, since the function and all its derivatives are required to vanish at that end 
of the range*. 

The primary purpose of this paper, therefore is to derive (corrections similar to 
Sh(*ppard's with a different set of assumptions. The results may be used as an 
approximate substitute for both Sheppard's and Pairman's. That is, they will 
apply approximately to both extreme cases and to the intermediate cases; on the 
whole they give better results than Sh(*])pard's and are not so difficult to admin- 
ister as Pairman's. 

The argument nms as follows. When a distribution is given merely by class 
intervals, there is no way of knowing exactly what the distribution would have 
been had the class intervals beem smaller; we do not know that we have a sample 
from an exponential curve, and even if we did we would not know that this 
sample would lie close to the exponential in form. We shall, however, try to 
draw a graduating curve in such a manner that (a) its partial area over each class 
interval will equal the frequency of the given distribution over that interval; 
and (b) its form within each class interval will be such that it will pass smoothly 
into the adjacent portions to the right and left. A good way to do this is by a 

• H. L. Rietz, ^^Handbook of Math. Stat.'' Houghton Mifflin Co. (1924). 
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freehand graph, frankly recognizing that there are many forms that will do 
equally well. To obtain a numerical result it is necessary to use the equation 
of some curve. Again frankly recognizing that there are many types which 
will do equally well we choose the simplest to handle : 

2 / = a + + c/^ 

Let the relative frequency distribution be defined by f{i), — m ^ i S n, m^ny i 
being integers. To satisfy (a) we have the equation 



Fig. 1 


To satisfy (b) we shall let 

V = ^[/(*) + + 1)] if < = i + ^. 

The latter will hold for all values of ^ from — m to n — 1 inclusive, but the end 
intervals require special treatment. Here in order to satisfy as well as possible 
both the high contact and the abrupt cases, we wish to let the material be 
distributed according to the way the curve is behaving over the two nearest 
intervals on the right (at n) or left (at — m) rather than by the addition of zero 
frequencies beyond the given limits. To do this we let the slope of the para- 
bolas be zero at the extremes : 



at t = — m — \ and < = n + J . 
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Then, if for example the frequencies arc increasing as one nears the right end 
interval, the curve will rise over the right end interval; if they arc decreasing, 
it will fall. These three conditions are sufficient to determine a continuous 
curve of the sort indicated in the figure. The (^xact moments of the curve may 
be found by integration and expressed in terms of th(» raw moments. The 
details are tedious and of an elementary nature and will be given only for the 
mean value i>i. 

To determine tJie coefficieiiLs of tin* parabola y — a -\-bt ct^ for the rectangle 
at f = / we may write the following three equations; the first complying with the 
requirement that the area under tlie parabola from t = i— = equals 
the area of the rectangle at / = ?, th(‘ sc'cond and third giving the ordinates at 
t — ^ and i + respectively: 


/•l+A 

/(O = / (a + hf + cf)dt , 

Ji-i 

/O) +f(i+ 1) 


-- o h (i c (i , 


+ /(^ 1) 


=z a h (i — 1) c (t — . 


Solving these (hrec simultaneous equations we get for a, h, and c: 

“ - + (¥ - 2 - + » + C? + 2 - 0/(-' - » . 

b = 61/(0 + (I - 30/(t 4- 1) - (I + 30/(t - 1) , 
c = — 3/(0 + if/(* + 1) + ir/(f — 1) > 


and these hold for — ffi -f- 1 ^ t ^ n — 1 . 

For the parabola 2/ = ai + 6i< + over the first rt'ctangle, i.e., where 
i = —m, we get the equations: 


/(- m) 



(ai + h, < + r, tOdt , 


/(- w) +/(- m + 1) 
2 


= a, + h, (- OT + I) + Cl (- m + , 


hi + 2ci (- OT - ^) =- 0 , 


and their solutions: 


ai = 4 m - A)/(- w 4- 1) - I (m* 4- w - |i)/(- m) , 

hi = -i (2m 4- l)/(- m + 1) - | (2w + l)/(- m) , 

Cl = l/(- w 4- 1) — f /(- m) . 
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Similarly for the parabola y = a„ + 6n< + c„t* through the last rectangle at 
i = n we get 


/(r) 


/•»+! 

(«» 

Jn-\ 


bfj, -f- j 


fin) + /(n - 1) 


- — -f- bn in — 2) “1“ , 

£d 

hn -]r ^ CjJl Cn — 0 , 

and for the eonstants 

an = I in^ + n - -A)/(n - 1) ~ I (/i^ + n - \l) fin) , 

= - Ml + 2n)/(n - 1) + Ml + 2ri) f{n) , 

Cn = f/Cn - 1) - I fin) . 

Having obtained the eonstants for the graduating curve we will determine 
the moments of this eurve in terms of those of the given frequency distribution. 

n 

Notation: Let the class interval be c = 1 ; let ^ ^"/(0 be the uncor- 

I m 

rected moment of the given frequency distribution about the given origin; 

n 

let S — ^lYfii) be the uncorrected moment of the given fre- 

I =— m 

quency distribution about its uncorrected mean; let P, be the corrected value of 
the moment about the given origin; and let /I, be the corrected value of 

the moment about the corrected mean. Thus and g, apply to the rec- 
tangles, and and jig apply to the curves as follows: 

r * + i f i 

i>8 = ^ / t^ia + ht + cf)dt / t\ai + hit + Cit^)dt 

J^-\ J-m-\ 

rnv\ 

“h / 4" bnt Cjjf)di y 

Jn-\ 

rt-fi r-m+i 

= X y / i^ — i^ 4' bt 4" ct^)dt 4" / — ^ 1 )* (^^1 4" bit 4" 

yJ:Z+i J-m-i 

fn+\ 

4- / it - hY (On 4- bj 4- Cnt^)dt . 

Jn-\ 

Using these symbols we have for the first moment about the given origin 
^ T'+i r-m+1 

vi = ^ / tia 4 - 4 - cfi)dt 4 - / tiai 4 - bit -\-CiV)dt 

J-m-l 

L 


CiV)dt 




4 - 


4 " bnt 4 “ Ont^)dt 
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+ - a, m + bi _ Ci j 

+ |^a„ n + b„ (n^ + + <•„ (n> + j . 

Substituting the values for tlie constants this becomes 

+ ('2 + i " “ ‘^] 


+ (‘i'‘ + Tif) t6t/(»,) + (5 — 3i) f{i + 1) ~ (5 + 30 f(i — 1)] 

+ 3/(0 + l/(^ + 1) + 7/(1 — 1)]^ 

+ { - w [| (w* + m - /j) /(- m + 1) - lim^ -\- m - /(- w)] 

+ (»«■' + tV) If (2"» + l)/(- w + 1) — f (2m + l)/(— m)l 

_ [|/(- m + 1) - lf(- m)]l 


+ {nlf (w* + » - rV)/(w - 1) - f (n* + n - ^ff)/(n)] 

-|- ^ (1 -|- 2n) f(n — 1 ) + ^ (1 + 2n)/(n)J ' 

+ + 4) - 1) - 4 /('')]} • 

i'l = ^ ^*/(0 + 24 ~ ^ ~ 16 m+ 1) 

-[m+ ^)/(- "») - /(n - 1) + n + • 


y] t/(0 = 2 »/(0 - (- m)/(- m) - n/(n) = xi + m/(- m) - n/(n) . 

— m+1 —m 


^ S /(* + ^ S . j 

i "■—»»+ 1 k ■■ — m+ 2 


24 


- ^/(- m + 1) - i/(- m) . 
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^ S ^ S ^ S - 24 ~ ~ 24 


24' 

24 2i-^^” 


Pi = n + mfi- m) - «/(«) + 24 - ^/(- w + l) - ^/(- w) ^4 

+ J 4 /(” - 1 ) + 2 ^i + jj. /(- + 1 ) 


l)-^4/(n). 


1 


("• + ”■' ■ i<i-^‘” ' + (" + ■ 


J'l = 


-^f(-^n)+f^f(n) + l^f(-m+l) 


I 

48 


f(n ~ 1) 


Using this same notation and method for the higher moments we get 

- ►; + (^ + ^)/(- m) + (^‘ + ^1) /(») 


( 


— m 
"24 


jj)/(- >» + 1) + (^ - ~ ■ 

5. ... - 3p.ii, „.) - U - j‘|] 


+ /(w) 


■_5 

_16 


+ i ” + 120 




+ /(- m + 1) 


,>n 1 

_1() i'^_ 


+/(»-i)[^'-4-i_. 


A 2 -4 M 2 17 

M4 = >^4 — 4m3*'i - OM2*'i 


+ 


fC-fn) 


5m® 21m2 17??i 
12 "40 W 


313 n // \r 

12 


5n» 4- U" ^ 1^^- 

40' ■*■ 30 1680 


13-] 

}80j 


. r/ I i\r — 1 1 , // 1 ^^ — n 1 

+ /( "»+!)[ 12 40 "30 12' 40 30 336_ ' 


SPECIAL CASES 

The above formulae are rather long and in practice the special cases below 
will frequently be preferred. 

(a) We may usually take the origin at or very near the middle of the range so 
that m = n, at least approximately. 



18 


W. T. LEWIS 


If m = n: 


h = Pi- + 1) - ^/(w - 1). 


A2 = •'2 - — - P? + (^ + +/(«)] 

ii3 = V3- 3Pim2 - 4 - P? + “ fi-ni)] 

[iG ^ “*' ~ ■ 

A - - A’--2 -4 M2 V\ 17 

Hi = Vi - 4 hsVi - 6/x2»'i ■“ '2 ” *2 ~ ^ 

+ [Tr “ 4 ^ ~ iS “ ~ 

+ [^5- + ^ 40 ' + ^Ir + • 

(6) Except in the abrupt cases the end frequencies and the difference between 
those next to tlie ends will be so small (relative to unity) that they will have a 
negligible effect on the corrections. If rn = n as in (a), and if also 

f( — m) = f(n) = 0 and/( — w + 1 ) — f(n — 1 ) = 0 : 


i>i = j'l . 


M2 = ^2 




12 


- +f{ — fn + 1) 


— w. 1 


.j- - VI _3 

M3 = Vs — OViH2 — . — Vi . 

4 

Hi = Vi — 4 h3Vi — ^Hivl — v\ 


- 2 

M2 _ 

2 2 


iZ 

64 




rn 

^ Is 


i68 




These formulae liave been written in the form which makes the computing 
simple. The following makes a comparison with Sheppard’s corrections easy. 
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/X2 = M2 — ^ + /( — ^ + 1) 


M3 — M3 + 


M4 = M4 — 




43 

i92 


m 

40 


1 mv\ 

^b' ~ 



The following speeial case is also useful in comparing my formulae with 
Sheppard^s. 

(c) Let/( — m) = \ f{ — rn + 1) and /(/?) = ^ f{n — 1). This produces a 
graduating curve which is (exactly tangent to the <-axis at th(‘ ends of the range 
and is everywhere continuous— though it does not have continuous derivatives 
at certain isolated points. It is, however, a curve whi(‘h to the ey(» cannot be 
distinguished from the type assumed in the Euler-MacLaurin theorem, which 
lies at the base of Sheppard^s formulae. My corrections become: 


Pi = i/i , 

M2 ~ 2 2 Is /Wl y 

M3 = M3 -* jT [/( — w) +/(n)] -f — — J/( — m) + > 

M4 = M4 - + "' + 

+ ~ w • 

Sheppard ^s are: 


J'l = *'1 , 

1 

M2 = M2 - ^2 » 


M3 = M3, 

- _ M2 , 7 

2^ 240 ■ 

Let us compare my results with Sheppard^s in the very special case in which 
/( — m) = f{n) = 1/7, /(O) = 5/7, m — n = \. The odd moments vanish. 
My corrections for m 2 and m 4 are 

M2 = 0 2214, m4 = 0.1870. 
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Sheppard’s are 

MS = 0.2024, /Z 4 = 0.1720. 

The numerical difference between the / 12 ’s is 0 0190, and the numerical difference 
between the /i 4 ’s is 0 0150. 

I'liis (jxample shows that Sheppard’s cornictions are not valid to the precision 
to which they are usually given if they are to be used for the purpose of correcting 
raw moiiKaits. The last t(u*ni in the fourth moment (;orrection, 7/240, might 
equally well be, for example, —43/192 as in my special case. This will become 
more evident to the reader if he will draw the curves indicated in this example. 
To the eye it will appear exactly like the kind specified in tlu' Euler-MacLaurin 
theorem; for example, much like the normal curve. Now suppose one adoi)ted 
for the moment the point of view (which I have criticiz(*d earlier) of starting 
with the curve used in this example, breaking it up into three partial areas and 
then finding the relation between the true and the raw moments. The partial 
areas found would be exactly those used in this example and this method would 
give us Sheppard’s corrections, but they would not b(^ exactly correct, for in this 
instance my formulae give exactly the relationship bc'tween the true and the raw 
moments. The difference is due to the fact that in this instance the assump- 
tions permitting the use of the Kuler-MacLaurin theorem in abbreviated form 
are not justified for this curve. But there is no way of telling at the outset, if 
one has given initially only the partial areas, whether precisely this curve or 
another which to the eye would appear very much like it is truly the curve which 
will graduate the same material when subject'd to a finer classification. 



THE POINT BINOMIAL AND PROBABILITY PAPER 

By Frank H. Byron ^ 

1. An approximation to the sum of a number of consecutive terms of the point 
binomial may be found graphically and quite expeditiously by me^ns of so- 
called ^^probability paper.” This paper is ruled so that the (x, y) graph of the 
equation of the integral of the normal curve 



is a straight line. Let the successive terms of the point binomial be represented 
as follows : 

(p + QY == Wo + Wl + • • • + Wf + • • • + Wn , (2) 

where Ut = nCtV^~^q^ and p q. Then the (a*, y) graph of the equation, 

t 

V = S w, , « + J = X , (3) 

*“0 

i.e.j of the sum of first (f + 1) terms of this point binomial, is, in all but extreme 
cases, a set of points lying on a gently turning curve, so gently that its form may 
be represented closely by two straight lines, each passing through the median 
point as will be explained in the next section. As paper of this sort is readily 
obtainable, and as this method yields as great accuracy as is really useful in 
many problems, it is suggested that its use ought to be quite general. 

2. Sheppard’s Corrections. The formulae for the moments of the point 
binomial, mean = gn, = pgn, are exact without any corrections such as are 
used for grouped material. This fact has led us all (apparently) to assume that 
in fitting the curve to the point binomial one would get a better fit by equating 
the moments of the curve to the uncorrected moments of the point binomial 
rather than to the corrected moments. The studies made in connection with 
the preparation of this paper show that when the purpose is to equate areas to 
sums of terms the corrected moments should be used. The theoretical basis 
for this conclusion is as follows : 

To simplify the argument let us suppose that one were seeking that curve of 
Charlier type, 

F(x) = CQ<t>Q{x) -f + . . . CA<l>i{x ) , (4) 

* With the assistance of Burton H. Camp. 
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(where 00 is the normal eurve and 0 i, 02 , • • • its successiv^e derivatives) whose 
integral would best fit the graph of (3). Since fitting is required only at the 
isolated points = ^, 2^, • • • , it is clear that one might obtain this by the 

two following steps. P'irst let /(a:) be any function whose integral meets exactly 
the requirement at these isolated points. What values this integral has at other 
points does not for the moment concern us. There are an infinite number of 
such f{x) curves. Next let the of (4) be so chosen that F{x) will fit f(x) as 
ru^arly as possible. The ordinary derivation of the c’s supposes that the fit 
between /(j-) and F(x) is to be mad(i by h'ast squan^s, the residuals being weighted 
by the factor l/\/0(^*)- No matter what f(x) is chosen, the Fs can be deter- 
mined so that the weighted integral of (f(x) ~ F{x)y^ will be a minimum, but the 
value of this minimum will vary from one f{x) to anotln^r. We now desire to 
select that f(x) which will make this minimum value as small as possible, and 
it is reasonable to suppose that our bevst selection will be some f{x) which is as 
kindred to the nature of F(x) as possible. We shall not therefore choose an 
f(x) which oscillates wildly between the points wln^re perfect fitting is required, 
(Fig. 1) nor yet an/(j’) which is made up of the top bases of the point binomial 



histogram; we shall prefer a modification (Fig. 2) of that histogram by a smooth- 
ing process. Such an/(x) will not have the exact moments of the point binomial, 
but, more nearly, those moments (*orrected for grouping. Then the determina- 
tion of the c’s will come out in terms of the.se corrected moments, not in terms of 
the uncorrected moments. (In fact the uncorrected moments would be the 
exact moments of (inf{x) having an oscillatory character between the important 
points.) 

Of course, when n is large, the difference is too small to be noticed and the use 
of Sheppard^s corrections is not worth while, and since n usually is large when 
approximations of this sort are needed, the point is not usually important. It 
was important in the computation j)f the tables of §4. Moreover, the use of 
Sheppard^s corrections does not invariably yield better results, the gain being 
masked sometimes by other effects to be considered in §3. An excellent illus- 
tration of uniformly better results is in fitting (^ + J)® by a curve of Type 4. 
The errors in the sums as derived from (4) with and without the corrections, is 
given on the following page. 
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t 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

With 

Corrections 

0002 

0001 

- 0003 ! 

- 0001 

0000 

0001 

0003 ! 

- 0001 

- 0002 

0000 

Without 

Corrections 

0007 

0022 

0039 

0036 

0000 

- 0036 

- 0039 

- 0022 

- 0007 

- 0001 


3. The stubby End. The other effects which mask this improvement are 

especially noticeable at the stubby end of a point binomial. We have to keep 
in mind here that the ai)proximating curve (such as ( 4 )), is required to turn a 
sharp comer, for, due to the least square method of fitting, it is just as important 
that it be close to zero when i is negative, as it is that it be close to Uo> * * 
wlien t is positive. Therefore, in ord('r to turn this corner it has to dip below the 
a:-axis in the neighborhood of < = — This makers the approximating curve too 
low just to the right of t = — unless the whole curv^e be arbitrarily widened. 
This arbitrary widening is (Uistornarily performed by not using Sheppard's 
correction for and tin' result is a betterment of the fit at these points but a 
(corresponding loss over tluc r(\st of the infinite interval. A good example^ is 
(I + The fit is worse at the left end when Sheppard's corrections are used 

but better over the rc'st of the interval. 

The same difficulty arises in anotlu^r connection. If we compare the closeness 
of fit to a i)oint binomial made by F(x) as written in ( 4 ) and by F(x) as it would 
be written if ca were zero, it often happens (as is well known) that the latter is 
actually slightly better on the average. How can this be true if the c's arc 
chosen by the method of least squares and the best choice as thus indicatc^d 
makes 6*4 different from zero? The an.swer is that tlie r's are chosen so that the 
fit is best over the infinite interval, not merely over the interval from ^ ^ 

to i = n + i, and tliat furthermore the distant points are weighted more heavily 
than those near the center. Thus it might happen that a choice, other than the 
least square choice, and one in which r4 would be zero, might be better for the 
restricted interval covered by the point binomial. This does happen especially 
when due to the abruptness of the stubby end of a very skew binomial, the 
curve has to dip below the axis in order to get by a sharp comer. A good ex- 
ample is the problem considered by Fry:® (yq + All the effects men- 

tioned arc present here. The fit is on the average a little worse if C4 is not e(]ual 
to zero over the point binomial interval, a little better over the infinite interval. 

4 . For graphical purposes a suflSciently good approximation to the median of 
(P + is given by 

- M — nq — (p — q)/d. 


* The true values are given on page 220 of Mathematical Part of Elementary Statistics, 
by Camp, D. C. Heath and Company, 1931. 

*T. C. Fry, Probability and its Engineering Uses, p. 258, Van Nostrand, 1928. 
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Tho following tables enabh' ns to find the first quart ile Qi, and the ninth decile 
Dij. The accuracy to which they can b(' plottf'd is only about one-t(‘nth that to 
whicli they are giv(‘n lu‘re. Tlierefore accurate interpolation is seldom neces- 
sary. Th(‘ values of St+\ are to lx* n^ad from th(^ grai)h at the points t -f- as 
indicat'd in the directions ])receding th(' tables. Tlie graiiliical method will be 
found (efficient if on(‘ uses common sense in tlu' computation. Numbers which 
are to b(' plotted should not be computed to a higher d(‘gree of accuracy than 
can be used graphically. Tn rt*ading the values of St+i it is W(41 to remember 
that the true value's lie' on a curve, and that outside tlu' interval from Qi to Do, 
they are slightly Ic'ss than those givem by the straight line'. Once the graph lias 





been made, all (he values of Si 1 1 can be read epiie'kly; it is not necessary" to make 
a separate computation for each t. This me'thod is then'fore sjiecially advan- 
tageous when one wislu's to find several sums eif this sort for the same point 
binomial. It should also b(^ noticed that one can t('ll from the appearance of 
the graph about how far the true sum would be from the two straight lines and 
so estimate the error to which his reading is liable. 

5. Illustration. Find the sum of the first 7 terms of (J + J)-®. 

Here ' t = 6, M = 8.278, Qi = (1726, Do - 11.369. The graph shows that 

t 

52 = 0.224. The true value is 0.222. So the error is 0.002. 

0 
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An idea of the accuracy of the method is given by the errors (out of two places) 
that would be obtained for this point binomial for various values of as follows: 







10 

12 

14 

10 

Errors 

00 

01 

00 

00 

00 1 

00 

00 

00 


Directions for Use of the Tables: Let p = (p M = nq — (p — Q')/6, 
Q\ = Xi qn, Dij — T2 + q72. On the graph draw the lines MQi and MDy. 
Head St ^i at t + 1. 

Values of j-i 
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INEQUALITIES AMONG AVERAGES 

By Nilan Nouuis 

Numerous inequalities among averages of xarious ty|)(*s are condensc'd in the 
monotonic character of the function 


0(0 = + I- + ' 

of the positiva^ numbers .ri, j- 2 , • • • , J'nj uot all equal (^ach to eacli. For / = — 1 
this function is the liarmonic m(‘an; for / = 0 it is th(' geonu'trie ni(\‘in; for t = 1 
th(' arithmetic imam; and for t ~ 2 the root mean stpiare. Th(‘ relations 
among th(\s(' four imams which customarily ar(‘ proved by special and dis- 
connected imdhods appcair easily as ap])lications of th(‘ tla'onau that 0(0 is 
an increasing function of t. That is, for any values of U and /2 such that -- oc 
< ^1 < ^2 < -h ^ , it will be true that 0(h) < 0(h). Sevc'ral proofs of this theo- 
rem have been publish(‘d, many of them very compl(‘X. An (‘xtremely siiniih' 
jiroof is herewith pr(‘sent('dd 

That 0(O> and all (‘xist and an* continuous for all nail values of t 

may be shown by (‘xpanding each of tlu* <iuantities x[ in a seru's of powers of I 
and considi'ring tin* r(‘maindc‘rs aft(*r (*ach of the first thn'c* terms. Th(* ordinaiy 
rul(* for (‘valuating forms n'ducing to ()/(), which requin*s tin* function und(*r 
consideration to la* continuous and to have at least a continuous first derivative 
for ^ = 0, may then b(* applied to [log 0(^)]// to show that 0(0) is the g(*oim‘tric 
im'an. It is clear that 0(— oo) and 0(+ ‘^ ) an* resp(*ctiv(*ly the* Icaist and tlu^ 
great('st of the x i. This fact and the nionotonic property of <^{t) make it evident 
that for each real valm* of h tlu* function may be regardi'd as an av(*rag(‘ in tin* 
usual s(*nse that it lies within the rangi* of the observations. 

For a simple d(*monstration of the incn*asing characti'r of 0(0, consider the 
auxiliary function 


F(J) = F 


0(0 


= 


d )i 


dt \i 



log X 


- log 


71 


It is clear that 0'(O Hio same sign as F{t), The theorem will b(* proved by 
showing that tlu* sign of F(t) is positive for all values of t except zero, when 
0'(O vanishes. 


^ Professor Harold Hotelling rendered invaluable assistance in condensing for publica- 
tion the material herein presented from a more exteinled study of generalized mean value 
functions 
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Differentiating the last expression with respect to one obtains upon sim- 
plification 

P'iO = ^ 2 ^ log*x) - (Xx‘ loga:)2] . 

By Cauchy^s inequality (known as Schwarz’ inequality when applied to integrals 
instead of sums), the expression in square brackets is positive. Hence F'(t) 
has the same sign as t. Consequently F{t)j since it diminishes for negative 
values of t and increases for positive values, has a minimum for t = 0. But by 
direct substitution, F(0) = 0. It follows that F(t) and </)'(0 are positive for all 
values of t other than zero. Therefore <t>(t) is an increasing function. 

By direct general methods it is possible to show that 

0'(O) = (nx)" -- [n2(logx)2 - (2 logx)®]. 


This expression obviously vanishes only when nS(log xy = (21 log a condition 
which is satisfied only in the trivial case when xi = X 2 = • • • = Xn^ 

A proof exactly parallel to that given above may be applied to integrals or, 
more generally, to Stieltjes integrals. The monotonic increasing character of 

J * appears in this way if one assumes that ^(x) is a non-decreasing 


function integrable in the Riemann-Stieltjes sense, such that ^(oo) — ^(0) = 1, 
and such that / x^d\l/(x) exists for every real value of L In terms of statistical 

Jx^Q 

theory, this consideration (extends the theorem from samples to populations of a 
very general character. 

Proof of the increasing character of has also been derived from Holder’s 
inequality, the demonstration being expressed in terms of Stieltjes integrals.^ 
The simplest general proof of the mono tonic attribute of </>(/) heretofore published 
appears to be that of Paul L^vy.^ As early as 1840 Bienaym6^ presented a 
generalized form of 0(0, namely. 



+ C2 Q2 + 

Cl + C2 + • 


• + C n< 
+ Cn 



and announced, without proof, its increasing character. In 1858 a proof of the 
monotonic quality of 0(0 for special cases was published by Schlomilch.® Of 


* J. Shohat, “Stieltjes Integrals in Mathematical Statistics,” Annals of Mathematical 
Statistics (American Statistical Association, Ann Arbor, 1930), Vol. 1, No. 1, p. 84. 

* Calcul des Prohahilitts (Ganthier-Villars et Cie., Paris, 1925), pp. 157/. 

^ Jules Bienaym6, SocUU Philomatique de PanSj Extraits des Proc^s-Verbaux des Seances 
Pedant L’An5e 1840 (Imprimerie D’A. Ren6 et Cie., Paris, 1841), Seance du 13 juin 1840, 

p. 68. 

* O. Schlomilch, “Ueber Mittelgrossen verse hiedener Ordnungen^ Zeitschri ft fur Mathe- 
matik und Physik (B. G. Teubner, Leipzig, 1858), Vol. 3, pp. 303/. 
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the more recent general proofs of the increasing character of which have 
appeared, those of Jensen,® Polya,^ Jessen,® and Carath^odory® may be men- 
tioned. A recent application of <l>(t) to index number theory is that of Professor 
John B. Canning.^® 

Vassar College. 


J. L. W. V. Jensen, “Sur Lcs Fonetions Con vexes Et Les Inegnlit^s Entre Les Valeurs 
Moyennes,” Acta Mathematica (Beijers Bokforlagsaktielbolag, Stockholm, 1905), Vol. 30, 
pp. 183-185. 

^ G. P61ya and G. Szego, Aufgaben und Lehrsalze Aus Dcr Analysis (Julius Springer, 
Berlin, 1925), Vol. J, pp. 54 /. and 210. 

® Bprge Jessen, “Bemaerkningcr om koveskse Funktioncr og Uligheder imcllem Middel- 
vaerdier,” Maternal isk Tidsskrijt (Charles Johansens Bogtrykkeri, Copenhagen, 1931), 
No. 2, 1931, pp. 26-28. 

* Attributed to Professor Constantin Carath6odory in an unpublished manuscript of 
Professor Harold Hotelling 

10 Theorem Concerning a Certain Family of Averages of a Certain Type of Frequency 
Distribution,” a paper presented before a joint meeting of the American Statistical Asso- 
ciation and the Econometric Society at Berkeley, California, June 22, 1934. 



MATHEMATICAL EXPECTATION OF PRODUCT MOMENTS OF SAM- 
PLES DRAWN FROM A SET OF INFINITE POPULATIONS 


By Hyman M. Feldman^ 

Introduction 

In the second part of liis investigations, ‘‘On the Mathematical Expectation of 
Moments of Fn^qiiency Distributions,’’^ Tchoiiproff presented a method which 
may be interpreted as sampling from a set of infinite univariate populations. 
In the present paper this medhod is extended to the study of moments of p^roduct 
moments of samples drawn from a set of infinite bivariate populations. It is 
also shown how this method may be extended to i)oj)ulations of higher order by 
deriving some of the simpler formula!' for populations of three and four variables. 

Tchouproff’s method has been criticised^ because of th(' complicated algebra. 
On close examination it is found, how('V('r, that it is not the algt'bra which is 
complicated but ratlu'r the symbolism. Tchoiiproff introduced a great variety 
of symbols both in his derivations and in his results. As a eons(»quence his work 
sc'crns v('ry intricate. If, however, the number of symbols is reduced, and the 
symbols themselves are simplified, which can be easily accomplished, the under- 
lying idea of Tchouproff’s method is found to b(' vi'ry simple. 

Quite a complete study of product moments of any bivariate jiopulation has 
been made by Jos(‘ph Pepper in his “Studies in the Theory of Sampling.”^ His 
method is essentially an extension of Church’s*^ method, in his studies of univa- 
riate populations, to bivariate populations. He does not, however, derive any 
generalized formulae. In the present study generalized formulae for both the 
first moiiK'nt and the variance of product moments of any order are obtained. 

It may be noted here, that all of Pepjier’s formulae for any infinite population 
can be obtained from those of the present study as special eases, by assuming 
that all the populations in the set are identical. 


^ A dissertation presented to the Board of Graduate Studies of Washington University in 
partial fulfilment of the requirements for the degree of Doctor of Philosophy, June 1933. 

2 Biometrika, Vol. XXI, Dec. 1929, pp. 231-258. 

3 Church, A. E. R. ^*On the Means and Squared Standard Deviations of Small Samples 
from any Population,” Biometrika^ Vol. XVHI, Nov., 1926, pp. 321-394. 

^ Biometrika^ Vol. XXI, Dec. 1929, pp. 231-258. 

* Church, A. E. R., “On the Means and Squared Standard Deviations of Small Samples 
rom any Population,” Biometrika^ Vol. XVIII, Nov., 1926, pp. 321-394. 
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Chapter I. Notations and Definitions 

Let (Xi, Vi), (X 2 , F 2 ), • • • (X„, Vn) be n bivariate populations each following 
any law of distribution whatever. The product moment of order a in X and b 
in V of the population will be denoted by &• H is defined as 

PU = E{X, ~ a,r {Y, - b,y (1.11) 

where ak = E(Xk), bk = E{Yk), (1-12) 

and where the symbol E signifies the expected value or the mathematical expec- 
tation of a quantity. 

Regarding each of the n populations of the sc^t as infinite, samples of n arc 
drawn, each member of a sample from one of the n pofiulations.^ The individual 
which is drawn from the A;**'* population will bo denoted by (Xky yi)] and the 
product moment of order a in x and h in ?/, of such a sample will be denoted by 


Pflh. This product moment may then hi) defined as 

Pah = n-' S {xk — xY (yk - yY (L13) 

where x = Sxkj y = Sijk . (1-14) 

The symbols a and h will now be defined by the equations 

a = Sakf b = . (1.15) 

Obviously E(x) = E{n-^ Sx,) = SE{Xk) = Sak = a. (1.16) 


Similarly E(y) = h. 1liat is, the mathematical expectation of the mean, of 
such a sample as was devseribed above, is equal to th(‘ average of the means of all 
the populations.^ 

In order to make the equations as compact as possible th(' following additional 
symbols will be employed : 


Xk — ak = Uk, X — a = w, and Uk — u = ih 

yk — hk = Vky y — b = v, and Vk — v = Vk 


also ak — a = Aky bk — b = Bk. 

From the above definitions it easily follows that 


E{uk) = E{vk) = EiUk) = E{Vk) = E{ti) = E{v) = 0 . 


(1.17) 


(1.18) 


® The term infinite is used here in the probability sense. It is defined very clearly by 
Church in his ‘‘Means and Squared Standard Deviations of Small Samples,^’ Bioynctrlka^ 
Vol. XVIII, Nov., 1926, p. 322. 

^ It may be easily shown that this is equivalent to drawing a sample of n from a set of 
any finite number of populations. The number drawn from each population, however, 
must be specified. See Biometrika, Vol. XIII, 1920-21, p. 295, footnote. 

* This, of course, is a result of the Lexis Theory, for Poisson and Lexis Scries. 
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The notation is now completed with the definition of the symbol Q,,- by the 
equation : 

Q,, = - ay ih - by = SAIBI . ( 1 . 19 ) 

Chapter II. The Mathematical Expectation of pah 

The math(?malical expectation of pab will be denoted by pah- In the terminol- 
ogy of moments this would be called tlie mean or first moment of the distribution 
of Pah- 


1. The Mathematical Expectation of pn- According to the above notation 
the expected value of pn is pn- By definition 

/111 = E(pn) = En-^S(x^ - x)(y^ - /y), (2.11) 

and obviously — x)(y^ — ^) = 7i^^SIiJ(Xi — x)(y^ — y). 

Writing 

Xi — X ^ [(x^ - a») - {x - a)] + [a^ ~ a] = U^ + Ax 

[(y. - bx) ~ (y - b)\ + [bx - 6] = 


equation (2.11) may be written as 

Pii = n-hSE(Ux + AxKVx + Bx) 

= n~mi:(UxVx) + n"^SAxE(Vx) + n'-^SBiE(Ux) + n-^SE{AxBx). 

Since for any given population Ax and Bi arc constants, it follows that 
E(AxBx) = AxBx. Hence 

n-^SE{AxBx) = n-^SAxBx = n~^Qn- 

Making use of (1.18), it is seen that the terms 7i~^SAxE{Vx) and n~^SBxE{Ux) 
are zero. The only term left to evaluate is therefore n~^SE(U xVx)* Since Ux 
and Vx are symmetric functions of the corresponding small letters, their product 
is symmetric in UxVx. There is therefore no loss in generality if attention is 
concentrated on a single subscript, say 1. 

We may therefore write 

rr^SE(UxVx) = n-^E(UiVi) + n~^SE{UxVx). 

*2 

Remembering that Ux = Ux — ii = Ui — n~^Suij we may write, 

Ux — Ux — u = Ux — n~^(ui + U2 + • • • + Un) 

= n^^[niUx — {Ui +.U2 + - • + Ux-l + Ux+l + . . . + Un)] 

* The 2 at the bottom of the S simply indicates that the summation begins with i *= 2. 
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where ni = n — 1. In general, n* will denote the number n — i. Similarly 
Vx = n-^[nivi - {vi + V2 + - • + + • • • + *^n)] . 


Thus 

n~^SE(UtVt) = n~^E(niUi — U 2 — * • • — Un)(niVi — V 2 — • • • — «^n) 

+ n~^SE(niUt — iii — • • • — fi — • • • u„) 

2 

(tllVx — Vi - • • • - y»..i — Vx^i - • • • t’n) . 

When the right hand side of the last equation is expanded the only terms which 
appear are of the form E(UxVx) and E(uxVj). The last one must vanish for 
and Vj are independent and hence E(u,Vj) = E{Ux)E{Vj) = 0. From the last 
equation above it is easily seen that the coefficient of E{iiiVi) is 

rii) = n~^ 7ii{ni + 1) = n~‘^ 

and because of the symmetry this is obviously the coefficient of any term of 
that form. Hence 


n-'^SE{UxVx) = n~hhSE{UxVx) . 

Since u* = Xi — — 61, then 

E{uiVx) = E{x, - ~ hx) = E{Xx ~ aO(l^ - «>0 = 

and in general, 

E{ulvi) (2.12) 

We thus get the formula 

Vi\ = n-hiiSP\i + . (1) 

Now suppose all the n populations are identical. Then all the i4\s'and also 
all the B^s vanish and therefore, Qu = 0. The formula (1) thus becomes 

Pn = Pm . (10 

n 

This is exactly Pepper^s formula for pu for an infinite population.* 

2. The Mathematical Expectation of P21. By definition 

P 21 = En-^S(xx - xYiijx - y ) . (2.21) 

• Biometrikaj Vol. XXI, p. 233, Eq. A, N =* «. As was already stated in the introduc- 
tion, all the formulae of the present study reduce to Pepper^s when the above assumption 
is made. 
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Proccpcling as above it is scu'ii tliat 

En-'S(.r, — xYiy, - y) - n-^SE{x^ — x^iy, — y) 

= n- ^SEiU, + A.y{V^ + B,) = n '^SE{U\V,) + 2n-'EE{U A.) 

+ n- ^SE{U\ B.) + 7i-'SE{V,A I ) + 2n-^SE(A + n-'SE(A U?.) • • • • (2.22) 

It is quite evident that tbe two terms before the last vanish. To evaluate the 
remaining terms, we employ th«^ reasoning of section 1 of this chapter and write: 

SE(U\V,) = E(U\V0 + SK(UlV0 

2 

= n~^E{7\\u^ — U2 — • • • ){niVi — V2 — * • •) + n~^SE{niUi — Ui — • • •) 

2 

— • . • )• 

Since terms of tlie form E{u\vj) vanisli, only the coefficient of the term E{u\t\) 
must be found. Again considering the subscript 1, the coefficient of E{u\vi) is 
easily found from the last equation to be 


n~'^{n\ — 7ii) = 7i~hii(ni + l)(/?i — 1) . 


Thus 


n-^SE(U: \\) 


n"'^njn2SE{u] vj = rr'^nirhSFl ^ . 


(2.23) 


For the sc'cond t(*rm of (2.22) we have 


= E(U,l\A0 + SEiU.WA.) 

2 

= n~^E(7hUi — ?/2 — ••• — ih — ••• )^ti + n~'^SE{7iiUi — — •••) 

(^0?^ - . )A^. 

Tlie coefficient of E(uiVi) in the first term of the right hand side of the last 

equation is n~-7i\Ai. In tin* second term it is n ~‘^SA » = — since <S\4 » = 0. 

2 

It therefore follows that 


2n-‘S/^(f^F,d .) = 2n-^niSP\iA . . 

(2.24) 

Quite similarly 


u~'SE(U]B,) = n-■n2SPUB^, 

(2.25) 

and it is obvious that 


n-^SE{A\B,) = n-iQai. 

(2.26) 


* Note that the w which has the eocfficient ni does not occur among the u’s which have 
the negative sign. 
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We thus get the formula 

V2\ = rr^7\in^SP\^ + n-^n2S{2P\^A, + PJo^.) + (2) 

3. The Mathematical Expectation of pn and ^ 22 . 

P31 = En-^S{x, - xY{y, - 1 /) = n-^SE{x, - xY{y, - y) 

= n-^SE{lh + .403(7, + BO = n-hS{E(UlV, + f/jB, + 3UlV,A, 

+ + SU,V,Al + 3[7,^?B, + V,Al + AlB,)} . (2.31) 

The two terms before the last are zero. The last term is 

n-^SE(AlB,) = n-^Qsi. (2.32) 

By (2.23) and (2.24) and some slight manipulation 

Sn~^SE{UlA,B, + U,V,Al) 

= 3n-3/72B(B‘o^,B, + PlM + Sn-KQnSPio + QloSPl,), (2.33) 
and by (2.22) 

n-\mUlB, + 3t/?F,^0 = n-^inl + l)S{PloB, + 3PUAX (2.34) 

The only new term which is to be evaluated is SE{UlVt). This may be 
written as follows: 

BB(r7*]F0 = n"^SE{n^u, - u, - ... y{n,v, -v^-^ . . . ). 

When the right hand side is expanded it is found that the only non-vanishing 
terms are of the form E{U\V^) and E{7i\ujVj). Only two subscripts, therefore, 
have to be considered. Without any loss in generality these may be taken as 
1 and 2, and the right hand side of the last equation may then be written as 
follows : 

SE{ 7 llU^ — Ui — ... y{ 7 ^lV^ — ui — . . . ) = E{niUi — U2 — * • • YiiiiVi — 1^2 — • • • ) 

+ E(niU 2 — Ui — . . .) 3 (nit;i — V2 — • • •) + BE{niUx — — 1/2 )® 

3 

{riiVi — t;i — P2 — • • • ) . 

From this last expansion it is easily seen that the coefficient of E{u\v^) is (n} + 
and that of E{u\7ijVj)y (Bn? + 3n2) = 3(2nJ + 712 ). We thus finally obtain 

SE(AJ\V,) = n-^[{n\ + ni)BB(wJO + 3(2n? + n,)SE{u\u^v)}, 

But by (2.12) E{u\v^ = Pin and since w, and Uj and w, and Vj are independent 
E{u\u,Vj) = E{u\)E{u,v,) = PUP{n Whence 

B([7;7,) = + n,)SP\, + 3(2nJ + n 2 )SPUPii]- (2.35) 
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From (2.31) and the succeeding equations we finally get 

= n-‘{(nl + n^)SPl, + 3(2n? + n,)SP*loPli] 

+ n-^{{nl + 1)S(PUB, + 3PJi 24,)} +3n~MW - DSiPUA.B, + Pl,A\) 

+ QnSPi 0 + Q2oSPi , } + . (3) 

The derivation of P 22 is so similar to that that it would be mere repetition 

to go through the details again. We shall therefore merely write down the 
formula for P 22 which is 

P 22 ^n-^{(n{ +nOSP ‘2 +(2n? +n.,)S(PJoPi 2 +4PliPii)} 

+ 2n-^{{nl + l)S(Pi,B, + PUA)\ + n-H(nl - 1)S{PUB: + ^P[ ,A,B, 
+ PUA]) + Q 20 SPI 2 + Q 02 SPI 0 + ^QnSPU] + n~^Q22. (4) 


4. The Mathematical Expectation of the General Product Moment pab. 
So far, formulae for the mathematical expectation of pai, for particular values 
of a and 6, have been derived. The method used in deriving these is, however, 
I)crfectly general, and now, that it has been sufficiently illustrated, it can be 
easily generalized. 

By definition we have 

pab = K[71~KS(x, - xy(y^ -- yy]. 

Making use of the notation of Chapter I this may be written as 

np„i = ESiU. + A,yiV. + B,)" = "s C'‘,cfsE{U'‘r‘'V'’r A\B\) (2.41) 

</,r=0 1 

where 

/^a ^ • /^b ^ • 

" ” q\{a - q)V ^ ~ r\(h - r)V 

Expressing the C/\s and Vs in terms of the w’s and v^s and setting a — q ^ 1^ 
6 — r = m; we may write for a particular pair of values q and r: 

n^-^-SE{UlV:A^iB[) = SE{rhu, - Wi ~ - Vi - • • ^^A^B:, (2.42) 

Consider, now, the general term in the expansion of the right hand side of 
(2.42) . It is of the form : 


l\m\ 

najn/3J 




where Tla^! = ai!a 2 * • • • ofjfc! 


... t>f*d*B:), (2.43) 


• In this case, and also in the formulae that follow, whenever two or more indices 
appear in a summation, it will be understood that no two of them can have the same 
value simultaneously. 
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For particular sets of values ji, ^ 2 , • • • jk, ai, « 2 , • • • otky and /3i, P 2 , • • • Pky this 
term will appear in every member of the summation of the right hand side of 
(2.42), and its coefficient will differ only in the exponent of ( — /zi) and in the 
subscript i of Because of the symmetry there is no loss in generality if 

we take for^i, j 2 , • • • jV, the first k integers. We now break up the summation 
of the right hand side of (2.42) as follows: 

SEiniUy — ui — • • •y{nlV^ — ri — • • 

1 

= E{niUi — ii2 — • • — V 2 — • • 

E{ii\U2 — u\ — • • ')^{n\V2 — A^B^ *4" * * • 4" E{nii(k — ui — • • 0^ 

{niVk - t’l - . • - Y^A^Bl + B Einiiu — - Y 

X = A + 1 

( 711 V, (2.44) 

From (2.44) we^ easily get for the total coefficient (('xcluding the numerical 
factor) the expression 

h (-ny'‘^^"AUn+ s AlBl. 

h^\ h^kAl 


Writing 


S AlBi = SAIBI - SAIBI = Q,, - SA^Bl, 

*+1 1 1 I 


the general term, (2.43), together with the total coefficient, may then be written 
as 


lla/i ! iTp/i ! \h = i J /.-i 

Since u^ and Ujj e» and and Ut and Vj an^ independent while Wi and i;» are 
not, we have: 

I. EUuk Vh = BEuh Vk — nP*j3,. 

II. Any term in which a/, + = 1 must vanish. 

From II it follows that the maximum number of subscripts which can appear 
in any term in the expansion of (2.42), i.e. the upper limit of A, which will be 
denoted by cannot excee^d {I + m)/2. In fact when i + m is even, ^ = (1 + m)/2, 
while when Z + m is odd, t is the largest integer less than (Z + m)/2. 

Making use of (2.41), the equations following it, and the reasoning of the last 
paragraph, we finally get the formula: 


n(-n)«+^Pab = (a!) (h\) 


S 

3 A“1 


Q I tI ah^O 


s 


A -1 



[( — 


1]A%B', 



pih 

TT ^ ah$ h 


(5) 
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The following restrictions on the a's and /3’s must be observed 
(a) ai -|“ a 2 + • • • -f- of* = a — q 
(h) pi + ^2 + • • • + Pk = b - r 
(c) ah + ffh I • 

In case the n populations are identical (5) reduces as follows: For q = 0, 
r = 0, A^i = 1, = 1, and Qqo = n; while in every other case ^*7^1 = 0, 

Qgr = 0. The summations with respect to q and r, therefore disappear. 

Consider now the summations 


.S .S ... P ^2 ... Pik , 

;i==l ]2^1 Jk’^i 

Siricc^ all the populations are the same we may drop the j by actually carrying 
out the indicated summations. If, then, there are c repetitions among the k 
pairs of integers ahfih, in which ai0i, a 202 , • • • are repeated li, h, • • • h 
times H'spcctively, then we have; 


S ... S 

n“ 1 j )i= 1 


T>Jh 

^ othPh — 


k ! C?* 


h\k\ 


hi 


11 P. 


ah^h 


We thus arrive at the following corollary: The mathematical expectation, 
pahy of the product momcMit, pahy in sami)les of n from a single infinite population 
having any law of distribution is given by 


n(~n)“+^Pa6 = 


s r S 


n Pi.,.- 


(50 


Note: In deriving these general formulae it was assumed that n > t. There 
is however, no loss in generality in this assumption. For, if t > we may 
supimse that, Xn+i = 12 = • • • = x, = 0, and hence P"J^ = • • . = P^^ = 0, 

and thus the above reasoning is still valid. 


5. Formulae for p^iy p32; pb\y P42, ^33. Formulae for pab in which a + 6 = 5, 6, 
7, 8 have becm obtained. But for (a + 6) >6 these formulae become very long, 
and since these will be of no use in the subsequent work, only those of order 5 
and 6 are given below. 

P41 = n-o {(«; - n:)SP\, + 27m|S(2PJoPli + ^P\xPU)\ 

+ n-^ {(nj + n,mP\,B, + iPl.A.) + &nn2S{PUB,Pi, + 2P;U.^^^o)! 


This is a generalization of Pepper's results for N — See Biometrika Vol. XXI, 
pp. 231-240. 

t The symbol PhAtPio is an abbreviation of the full term + A,) (P 11 P 20 + 
P 11 P 20 ). Similar abbreviations will be used in the other formulae. 
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+ 2n-* |(n? + 1 ) S{2PIoA.B, + SPUA\) - 2 QnSPJo - , ) 

+ 2n-’ {(r 5 - 1)S(2P; i4j3PJo^!P.) + 2 Q,„.SPI ^ + SQjiSP^ o 1 + n-^Qu . ( 6 ) 
P3, = n-« {(wf - m)SPl2 + nnlS{PioP'oi + 6P5iP{ i + 3Pi„Pij)l 
+ n --6 {(nl - 1).S(2PJ,B. + 3 P^ 2 ^.) + Snn^SiPloPiiB, + [P^„P ^2 
+ 4P;iPM.4.)| + n-M(n? + 1 )S(P 5 „P? + 6 P^iA,«. 

+ SP\^A\) - Q,2SPI, - QQuSPl, - 3 Q 20 SPI 2 1 + {Oil - l).S(3P^o^.P! 
-{- 6Pll4,P,- -|-Pj2^t) “f" 3 Qi 2<SP5 q -|“ 6Q21^PI I "p Qso!^P'o 2 1 “I” . (7) 

P„ = n-^ {(nj + n,)SPi 1 + 5(nl + nf + W 2 )*S(Pj oP{ . 

+ 2P5iP^o) - 10(2 rJ -M2)SPJiPJo + 30(3«? + «.,).S'P2oP^oPj 1 1 + n-'jOt? 
+ 1).S(PJ„«. + 5PUA,) + lOOii + im2P^oP^oB. + (2P5oPii 
+ 3PJiP|„).l.l - 10nn2.S*[2P3oP2%P, + ( 2 P]oPli + SPUPI^JA,]] 

+ 5n-* {{n\ - 1)S{P\,A,B, + 2PUAI) + Qnn^SiPioPLoA.B, + 2P-„Pi,^:) 
+ QnSiPl 0 “t“ 6P2 0P2 0) "P 2Q2 oS(P3 I -p GP.] flP 1 1) 1 "p 10/1“'* { (?tj 
+ l)S(P^oAU^. + PhA^.) - Q 2 iSP*o - Q,„BP .], , I + r>irM(ii? - l)S(2P^oA^.B, 
+ Pi,At)+ 2 QsiSP^:o + Q 4 oBPi , ) + u HJ,, . ( 8 ) 

P 42 = n-^int + n,)SPl, + (nf + vf +n,)S(PloPi2 + m,Pi, 

+ 6 Pj 2 Pio) + 4(2n'J - W 2 )S(P 5 o/'/o + 3P,J ,P^ ,) + G(3 r? + /i 3 )S(Pi pP^ oPo 2 
+ 4PJoP{.P^)! + 2 n '(Oi? + D.SXP^iP. + 2PUA,)+ 2 («.? + \)S[(2P\oPi , 
+ 3PJiP^„)P.+ a^^oP^2 +6P2lPJl +3Pl2PioM.] - 2««2«[(2 P^oP/, 

+ 3P‘ . P| o)P. +(P3 0 Pi 2 + 6PJ 1 Pi . + 3P1 2 PJ 0 )^, 1 1 + { (nt - l)S(Pi „P? 

+ SPl,A,B, + QPlM +Qnn2S[PUP'2,Bl + 4PjoP} i^P. + (P 20 P 52 

+ 4PiiPj\M?] + Qo2S(Pio + 6P^„P^) + SQnSiP',, + 3P2„P(,) 

+ 6P2o-S(Pj2 + PloPi2 + 4P;iP^)l + 4«-M(«? + l)SiPUA,B\ 

+ dPi,A]B, + P{2AI) - Q 12 SPJ 0 - 3 Q 2 .SP 5 , +QJo-SPl2j 
+ n-^S(QPioA^.Bl+8PirAUii + PJjAt) + Q,oSP^o2 + SQs.SPIi 
+ 6Q22-SPi„} +n-‘Q42. (9) 

p33 — { (wj -p n^SP\ 3 -p 3 (r{ "P "P f^^S{P 3 1 P{, 2 "I" 3P 2 2 PI 1 

+ P;3p^o) - (2nt - n^) S(PUPL A- OP^iPi,) -h9(3nt +ns)S(P^oPixPo2 

*The repetition of this expression signifies that A and B factors arc coupled only with 
those P factors which have corresponding indices. 
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+ +3n-^{(nt + 1)S{P-,,B, + P^.A,) + (n'J + 1) SKPUP’o^ 

+ &Pl,P\, +W\,PU)B, + iPo3PU +&PUPii +^PliPi,2)A,] 

— nniS[{PlgPl,^ + QPiiP'i 1 + 3PJ iPia^B) -f- (PJ o + ^P'nPi i 
+ 3Pl,PiM^] + 3n-M(nl - l)S(P^ifi? + 3Pi,yl.P. + 

+ 3n,«^S[PJoP{,P? + (PUP^2 +4P;,Pl,M.B. + P;iPi2A!)] 

+ maCPai +3PJoP.S) + 3(3„(Pi, + Pi„P', + 4P; ^Pj ,) + Q,oiP{, 

+ 3P;2P|i)] 1 +n-MW + 1)*SI(P5„P-] +9PiiA.P“ +9P;,A!A. + PS3An 

- SiQosPl, +9QnPii +9Q2,P;2 +Q3oPi3)| +3n-»{(«‘f - l)5(Pj„A.P’ 

+ 3p;.a*b! + p; 3 A':p,) + .s((^pJo + 3Q22Pj, +Q3iPi2)l +«--'(333 . (lo) 

Chai’ter III. The Mathematical Expectation of the Variance of pab 

1. The Symbols and Denoting tlie variance of p„b by m and 

the mathematical expectation of 2 »JfoA by tMi>„h > we liave the definition, 

2 TO,-„t = {n~'S(x, - x)"{y, - yY - pai,p 

= n-^SKx, - xYiy, - yY - 2n-'-pahSix, - xYiy, - yY + plb> and 

= Eiitn-n^b) = E{n--SKx, - xYiy, - yY - 2tr-'pabSix, - xY(y, - yY + plb) 

= n~^E[S(x, - xY-"iy, - y)^] + 2n-^E[S(x, - a-)“(x, - xYiy, - yYiy, - 2/)*’] 

- 2n-%bE[Six, - xYiy, - yY] + Plh = n-%a 2 b 

+ 2n-2p[-S(x. - xYiy, - yYixj - xYiy, - yY] - Plb- (3.il) 

Before attempting to expand the right hand side of (3.11) for any values a, h 
we shall derive the formula for to illustrate the proeedun^ 

2. The Mathematical Expectation of 2 Wpii- By (3.11) we have 

iMpn = n-'pjj + 2n-^PJ[Six, - x)iy, - y)ix, - x)iyj - y)] - ph. (3.21) 

The first term is given by (4) and the last by (1). The only new term is the 
middle one. To expand it let us write it in terms of U and V. We then have: 

w-^<SP[(x,- - x)iy, - y)ix, - x)iy, — = /i-“-S’P[({7, + .4i)(F. 

+ B.)(I/y + A,)(F, + B,)] = n-^!SP[t/.F.f7,F, + iU,V,U,B, + U,V,UiB,) 

+ iUiViV,Ai + U,ViV,Ai) + iU.V,A,B, + U,ViA,B,) 

+ iUiV,A,Bi + UjVxAiB,) + U,U,B,B, + FiF,AjA, + 4 vanishing terms 

+ AiB.A,Bi]]. (3.22) 
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The evaluation of the last term is very simple. For 
and from the elementary theory of symmetric functions we have: 


Hence 




(3.23) 


To expand the first term and also the remaining ones, we return to the u, t;, 
notation defined in Chapter I. We then write 

S£;(f/.F»C/,y,) = n-^BE{(n,Ui - Wi - ... - ri - • . • ) 

(wp/; - ui - ... ){n\Vj - 1^1 - ...)]. 

The only terms which can appear in the expansion of the right hand side of the 
last equation have the following form: 

E{u]v\), E{u:v]), 

i.e., exactly those which appear in the evaluation of P22. Remembering the 
symmetry, there will be no loss in generality if we take for i and j the integers 
1 and 2. To find the coefficients of the three characterstic terms, the above 
summation may be broken up as follows: 

n^SE{U^V^UJVJ) = £’[(^1^1 — W2 - • • - 1^2 - • • — wi — . . .) 

{?iiV2 ~ ri - ...)]+ £{[«iwi — U2- • • 0(^1 - V2 - • • •) + {niU2 - Ui 

- . . ')(niV2 - vi- . . .)]S(nl^^, — wi — . . •)(niv, - ri -...)} + SE[(niu, 

3 3 

- wi - . . .)(nii\ - I’l - • • ’){.niUj - . ‘){niVj - Vi - «..)]. (3.24) 


Writing the three terms in a row and their coefficients from the three parts of 
(3.24) in columns below these terms, we get the following scheme: 

Eiulvl) E{xi\v\ + £(wit;i ^2*^2) 

{n\ + 1)^ 


+ 1 ) 


— 2niri2 


2^2 


^2/^3 

2 


n27lz 

"2” 


2/12 n3 


nni(2/2i — 1) 
2 


— nw3 

2 


Total 

coeff. 


■“ 3/ij -j” 3). 
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With the aid of the above equations we finally get: 


SEiU,V,U,V,) = SPl 


riKi 


SPl,P 


20' 02 


+ n('«nj — 3 n 2 )SPI,P’, , 


Proceeding in the same way we. find : 

mU.V.UA + V,V,lhB,) = n-K2n\ + 

SE{U,V,V,A, + U,VA\A,) = /r’(2«? + n,)SP\,A, 

SE(U,V,AA + V,V,AM = -nn,SP[^A,B, + (n* + r>,)Q,,SP], 
SEiU,V,AA+l'jV,A,B,) = 2«.sp;. - Q„sp;, 

SEiUMAB, + y.y;.4..4,) = nSiPlA", + PJ,.4^) - hSiQ^Al^ + Qo^PU)- 
Collecting terms and simplifying we finally get: 

sJl/pn = /i-M»?‘?/"22 + SiPlAU + 2PlAii) - 
+ 2n-hi,{SiPUB, + P;2-4.)! + n-^S{P',oBl + 2P\,AA + P52'4;)}. (11) 


Corollary 1 . In case A”, = Y „ i.e., when the set of populations are univariate, 
( 11 ) becomes 

= n-'fn^SIPJo - {PI,)-] + 4,S’PJoPi„) + 4n-’/7, .S’P^„.l, + An-^SPioA\. 

(110 

Tliis is Tchouproff\s formula for the expected value of the variance of samples 
of 

Corollary 2. In case tlie n populations are identical (11) becomes 

2Mp„ = + - n,P\,]. (11")” 


3. The Mathematical Expectation of 2 ^/po 6 - W"e now return to the general 
equation 

iMp^ = n“‘p 2 „ 2 !, - pit, + 2n-* S Eix, - x)'‘{y, - y^ix, - x)'^{yi - y)". (3.1 1) 

1-1. ;-l 


♦Since E{u\v\) * P 22 , -fi^(^W’;) * P 20 P 02 ) etc. 

See Biometrikay Vol. XIII p. 295. ' 

” See Biomeirikay Vol. XXI p. 234, Cor. 1. 
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The first two terms are given by (5). To evaluate the last term we write: 

SEKxi - xHvi - ynx, - xYiy, - y)‘] = SE[{U, + A,y{V, + B.nU, + A,y 

(.V, + B,y] = sE{u:v':u‘;v]) + “'“s'' 

SE{U'lV\Vy]A\^B\KVyBY) = n-2(“+wS£j(n,u. )“(niP. 

1 

(mu, )»(n,p, )‘ + S . . . cJ^SE[(n,M. )» 

'1 ' i 

(nit). )^(n:W; - • • .)'’(nii'; YA'^B'^A'^By, (3.31^ 

where a = a — n, = a — r 2, 7 = 6 — ra, 5 = 6 — r 4 . 


The right hand side of (3.31) has been broken up into two parts because the 
first part is symmetrical, while the seexmd part, in general, is not except when 
ri = r 2 , and n = n. 

Let us now consider the expression 

SE[{niU^ - • . . - • • • fiihUj ^ . Y{thv, -- •••)*]. (3.32) 


This is a double summation in which r,; = C;, and in which the diagonal terms, 
c»t, are missing. 

Consider next a general term of k factors from the expansion of each bracket 
of (3,32). As we are dealing with symmetric fuiu'tions, there will bo no loss in 
generality if w^e consider the first k subscripts only; and if we let the lower limits 
of the exponents of the ?/^s and y’s begin with zero we may consider that each 
parenthesis of a given bracket contributes exactly k factors. Such a term, 
omitting the coefficient, may be written as follows: 


E{uV • • • = II 


= it "b ®ii) "t" 


(3.33) 


This term occurs in every one of the ^nni brackets of (3.32), having the same 
numerical coefficient in every one of them, which is 


(« !)=» {h !)» 

na* ! n®; ! ! ' 


(3.34) 


To obtain the Wi coefficient of (3.33) wc break up (3.32) into the following partial 
summations: 

JB[(nitt, — • • • )‘‘(niP, - . . . Y{n\u, — • • • )“(niP, — •■•)*] = E[{niU\ — • • • )* 
(njPi - - Y (wiWj - ■ • • YiriiVi - • • • )‘] + • • • + E[{niUk-i - • • • )“ 
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(wiVn.-! — •••)'' — ••• YiniVk — •••)*]+ S — •••)*' 

(mt), -•••)* ‘S' («iMj — • • • Xwi'’; — • • • )‘ I + <S [i/{(wiu. - ••• )“ 

(h,j., _ . . . Yiniu, — y(niv, — •••)*}]• 

From this equation we get for tl\c total <!oeflficient in n of the term (3.33) the 
following cxprc.ssion : 

h,h'^l A -I 

Tlie following restrictions on tlio a's and /3's must be observed. 

+ ak = a 


aj + «2 + 
(a) , 

«1 +^2 + 


/I \ ”1" ^2 + • * • + = fc 

/ (b) , , 

• * + + ^2 + • * * + ^ 
(c) Oih (Xf^ ^ 1 • 


From (c) we obtain the upper limit of namely: t = a + b. 
Combining the various abov'e equations w^e finally obtain : 

(y^)2(«4-6) = (a !)2(6 !)2 .S s' 

ah,a[„lih,0'.>-=O 


t 

s 

k^l {h,h'=-l 


S (-/u) 


«/i t /3/i-f a 




« . + . 


+ 0- 


II 




na„ ! Ilal ! 11/3, ! IT/ 3 ; ! ’ 


nA 

(3.35) 


Turning to the second part of (3.31) let us consider the expression 

S - . . . ) {nii\ - . . . ) (muj - . . . ) {mvj - . . . ) 

for a given set of r^s. The term (3.33) may also be considered as a general 
term of this last expression; of course, the exponents of the u\s and 2 ;^s will be 
different in this case. In order to evaluate the complete coefficient of a term 
like (3.33) we again write; 

SFAiniU. Hfhv, Hmv, y A:^B7A]%^ 

= E[{niHi ~ . )“(??irj - . . . )>(«iWo ~ . y{7iiV2 - • . • yA[^B[^A[^Bl^\ 

+ E[niU2 - • • . y{niV2 - • • • ).^(/iiiq - . . . )^(/uvi - . . . yA2^B\^A[^B\^ 

+ • • • + E[{nxUk - • • • y{n\Vk ~ • y{tixUk-i ~ • y{niVk-\ - • • • )* 
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avbiwh, Biu] + s E[{mu Hn,v. — S 

> = 1 V'-A-l 1 

(«,M, y{n,v, yA';B';] + S EUn^u, )« 

A]%^ S (n,M. )“(n,p. S E[{n,u, )» 

t“Ar + l t , 4-1 


n^v, )y{mu, y(n,v, yA?B7A]%*]. 


(3.36) 


It is now quite easy to write down the eomplote coeffieient of a term of the 
form (3.33). The numerical coefficient of this term is the same in every bracket 
of (3.36), and is 


(- ~ n) ! (a - r2) ! {h - r.) ! (h - n ) ! 


Ua,\Ual\Ufi,\Ufil ! 


(3.37) 


The coefficient in th and A[^ is broken up by (3.36) into the fol- 

lowing four parts: 

I. S (~ni) from the first fc(fc — 1) brackets. 


h 


II. s '*''4;. ifi;* s 4;,?/?;4= s {-ny'^’‘''AyBy 

A**! /.'“A, 4-1 /»- 1 

^Qnr,- 'k^A'.yB],^, 

from the next fc(n — k) brackets. Similarly 

III. S (-niy'''^^'‘'A',mii\Qrtr 3 - S .4;'/47l, from the next k{n - k). 

h'=l L //==1 J 

And finally: 

IV. S A\^B\^A)^B]^ = SAI^B]/SAI^BI^ - 

1 1 1 

- S Al^B'yA’JB'.i - S A'yB'y S AIW'J- S A'JBli s AyBy 


, « y, ^ ^ A 


/l«l 


/r=i 


h=l 


h”l 


+ 2 S AVB'y S AlWlf = Qr,„Qr,n — Q(r]-f >2)(»34'»-4) ““ Qrira 

;t-i /I'-i 1 


^ Qr^uSAl^Bl^ ^ S A^,^Bl^AlWli + 2 S Al^B[^ S from the 

1 h,h'^i /i»i 

last c!* brackets. 
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The restrictions on the a^s and /3’s differ from tliose given above in that a is 
replaced by a — n and a — 7 ^ 2 , and bhyh — n and b — and from the restric- 
tion (c) we get for the upper limit of fc, in this case, 


a + P + y + 8 ^ , ri + 7-2 + n + 7*4 

u = a + b 2 

4 /ScK 

when Sr^ is ev(Mi, or the greatest integer less then — when iSr, is odd. 

1 ^ 

Combining (3.37) with we get for the general numerical coefficient 

in the expansion of the second part of (3.31), the expression 

_ (~l ).SV.(a !)H/>!)^ _ 

By an obvious manipulation we have 

I + II + II1 + IV= S + 

a,/('=iL J 

- S S S AVBl^ 

A-l /j = l L J h~l 


S r (-«,) - 1 Al^m* + QnnQnr, - Qi 

L 


»’l f '■2)(r3-fr4) • 


(3.38) 


Finally, combining the various c(iuations \vc get the formula: 


a , b 


+ 2(«)--«“+‘+i' (a !)K^' 0* S S S 




S (— -f 7u. S 4- + CJ* 

/i./i'-l fi-l 

(q:;, 4- Q^/t) (Ph + 0h) , 2(,j)-2(tt+6+i) (a^YHb^y^ S S 

UahllWklnotlWWll ' * 1 r,.r,r,.r,= 0 11 ^! 




aA+/3/*+a ,4-/9 


s s < s i(-7u) - 1 ] a;, 

- s - 1 ] S is [(-770«>;-l] 

7.-1 A-l 7i“l 

S AlBl^ + Qr,r, is[(-770"^‘-^'^^ -1] 

;.“i /.=-i 
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4“ Qrlrz S [( — /li) ^ ^ 1] “1“ QnnQr2r\ — Q(ri+r2) (r34-r4) 

In ease the n populations arc identical the second part of (12) must vanish, 
and in the first part the summations 


jA-n /i = l * ^ 


k ! Cl n Pi 


k ^ (aA4-a^)(/3A4-/!J^) 


/i!/2 ! ••• /J 


where Zi, Z2, • • • Zc are the number of repetitions of the pairs of integers 
(oil + ol[) (fii + ... (ak + a[) (^K + jSlO, respectively. 

We then have the following 

Corollary: The mathematical expectation of the variance, 2W;,«5, of the product 
moment, pah, in sami)les of n from a single infinite population is given by 

= P2«26 - vl ,+ 2(a0-2(«-h+i) (a!)2(6!)2 "V’' 


U\h\ - ’Ll 


h,h'^l fi^l 


II B(ah + of / t ) (Ph + ^ a ) 

+ (__n,)«A-^^] + ' 


(120 


4 . The Formula for 2MP21. Formula (12) can by no means be used mechan- 
ically. It does, however, summarize to a great extent the details in finding 
2Mpah for any given values a, b. Formulae for 2A/P21, 2^2^31 have been ob- 
taiiH'd, but the one for 2Mp^i is too long to be included in the paper, especially 
since with a little work it can be easily derived by applying (12). The one for 
2Afp2i is given immediately below. 

2^2,2, = n~nn\n\S[P \2 ~ (P^y] + nlS[P\oPl 2 + ^(PloPi^ ~ n 2 Pi,Pi,)] 
- 2 nln,SPi,Pi,+{nl+ 2 )S(,PUPioPU+ 8 PioPliPii) + QSPloPioPU\ 
+ 2 n-^[mnlS{P\,B, + 2 PUA, - Pl.Pi^B, - 2 Pi,PUA,) 

- irHniS{PUB,Pl, + P{,A,Pio) - 2n,S[n5P + 2(2n* - 3 )PJi^P{i] 
+ 6 nSPi,P{,A, + 4 n,S(PUPiiB, + Pt^Pi^Ai + P\,A,PU + 2 Pi,Pi,B, 
-pPUPiM +n-*[n\S[P\,B\ - (P*„P.)*] + 4 SP‘„P|„(P. + B,)» 

+ 3 («* + 7h)SPUA\ + ^SPUPiAA. + A,y - 2n,S[PUPUA] 
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+ 2P|iPj,Ui + A,)^] + 16SP;i4.Pi,^ - 4ni5(P*i.4.)» 

+ 4(2n2 + nt)SPiiAtBt — iriiSPl i^.B.Pjo — SnsSPnPioAiBj 
+ 8S(P;,P.Pjo^; + P[iA,PUB,) - 4n|SPii^.P^„B. 

— 2nin4n“‘/S(QjoP2 2 + 2QnPii) + 2n2n“'<S[6QiiP{ ,P^o 

+ Q^iPioPL + ^PiiPit)]] +2n-*{nn,S(2PUA,B] +2P[al + ^PiiA^iB,) 

- riiSmiPlxBi + P|2^.) + 2QxxSPIoB^ + 2P‘ j^.)]} + n-'fn^SlPi*^'. 

+ 4(P‘o^?B? +PIi^*P.)] -2nS[(Q2o^.P. + QiM*)PJi +QnP5o^P. 

-f- Q 20 P 0 2 -'^ >] "f" 'SlQl flPo 2 "t" 4Q2 o(QiiPi 1 "I" Ql 1 P 20 )]) • (13)''* 

Chapter IV. The Mathematical Expectation of the Third Moment of pn 

1. The Mathematical Expectation of stnpu- Following the notation of the 
last chapter we shall denote the third moment of pn about its mean by 3 »ipn and 
the mathematical expectation of by We have then by definition. 

mnu = {n~'<S(a-. - x){y, - y) - pnV , 

and by a well known formula we have: 

zMpn = Pii - ^M„nVn - v\ \ ■ (4-11) 

The last two terms of (4.11) arc given by (1) and (11). To evaluate p\ 1 we 
write: 

pj, = A’{n-bS(r. - x){y, - y)Y = n-^SEix, - a;)»(.'/. - 2/)’ 

+ Sn-^SE(x, - x)Ky, - y)Kxj - x){y, - y) 

+ 6n~hSEix, - x){y, - y){x, - x){y, - y){xk - r)(yt - y) . 

The first tenn is simply n~'‘p 33 which is given by (10). The evaluation of the 
second term is not essentially different from the evaluation of the left hand side 
of (3.22), and since all details have been given there we shall omit them here. 
To evaluate the last expression let us write: 

SE(xi - x)(j/, - ij){xj - x){y, - y)(xk - x)(i/* - y) 

= SE[iUx + ^.)(F. + BdiUi + AXVi + BXUk + A,){V, + B*)] 

= SE{U,ViU,V,UkVk) + SE{U,V,U,V,U,B,) + • ■ • + SP(4.B.X,B/A*B*) . 

(4.12) 


In case the n populations are identical this reduces to one of Pepper^ s formulae, 
Biomelrikaj Vol. XXI, p. 238, Cor. 1. 
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As tliere is a great deal of similarity among the various terms of the right hand 
side of (4.12), it will not be necessary to go into the details of the expansion of 
every one of them. We shall, therefore, indicate the details for the expansion 
of only two of them — one S 3 nnmetrical and one non-symmetrical ; and as the 
first two terms are of that type we shall use these for the purpose of illustration. 
Using the Uy v notation we have 

SE{U^^,U,V,UkVk) = n-^SE[{n,u, ~ ){n,v, - ... ){n,u, - ... ) 

{niVj — ... ){niUk - ... ){niak ~ •••)]• 

The maximum number of subscripts appearing in any term evidently being 3, we 
can write without any loss in generality: 

SE[{niU, - . . . ) ... {niVk -...)] = Eiinm - • • • ){niVi ~ . . . ){niU2 - • . . ) 
niV2 - • • • )(niW3 - • • • ){niv-i -•••)] + E{{niUi - . . . )(niVi - • • . )[{niU2 - . . . ) 
(niV2 - • • • ) + (riiUs - . . . )(nii»3 ~ •••)] + iniU2 - . . . )(niV2 - . . • ) 

(niUs - . • • ){niV‘i - . . . )}S{niU, - . . . - . • . ) + ^{(niUi - . . . ) 

4 

(niVi — • • • ) + («iW2 — • • • )(nii'2 — • • • ) + niUi — • • • )(ni% — • • • ) ! 

/S(«iw. - . . . )(ni)>, - • • . • )(niVj - • • • ) + SE{ (niu, - ...)••• 

4 4 

(nil;, -•.•)}. (4.13) 

The coefficients of the various terms arising in this expansion can now be 
found quite easily. For example, the coefficient of Pj 3 , which is, of course, the 
same as the coefficient of P 3 3 , is easily found to be 


+ 1 ) + ® 

A o 


nmrwi^ni — 2) 

6 


To evaluate the summation SE(UiVtU,V,UkBk) = — ...) 

(niVi — ... ) (uiu, — . . . )(niv, — . . . )(niVjc — . . . we break it up into 
partial summations as follows: 

SE[(niU, — ... )(n]Vi — ... )(niu, — ... )(niv, ) (tiiUt — ... )5*] 

= E{ (niWi - . • . )(nii>i - • . • )[(niM2 — • • • )(niV2 — )(niU3 — ■•■)Bs 

+ (niUt — . • • )Bi(n}Ut — .. . )(niVs — •••)] + (wiMi — • • • )Bi(niM2 — • • • ) 
(n,»2 — • . • )(niUi — ... )(ni«3 — • • •)} + E{(,niUi — ... )iniVi — • • . ) 

[(niU2 — . . . )(nir2 - • • • ) + (wiWa — • • • )(nit)3 - •••)] + (wiw* - • • • ) 

(n,Vi — ... )(niU3 — . . • )(nit;3 - • • • )}S(niM, - ■ ■ ■ )Bi + E{(niUi — . . . ) 

4 

(niVi — ... )[(niU2 — )Bi + (mua — • • • )jB 3] + (niW* — • • • )(ni»2 — • • • ) 
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[(njUi — ■ ■ • )Bi (niU, — ■ . ■ )J53] + (wiMs — . • . )(niV 3 — ... ) 

[(niui — ■ ■■ )Bi + (n\V2 — ■■■ )B2]j/S(niU/ — . . • ){niv, — • . . ) 

4 

+ E { (wiUi — . . • )(ni«i - • • • ) + («iM2 — • • • ) iniVi - • • • ) + (wiWa - ■■■) 
(niVs — • • •)} <S(niu. — • • • )(niv, — . . . )(niu, — ■•■)B, + E{iniUi — ■ • • )B, 

4 

+ (njM* - • • • + (niUs - ■■■ )B3}S(niU, - ... )(niVi — ... ) 

4 

(niu, — . . .)(niv, — ...)+ ESiuiU, — ... ){niv, — . • • ){niu, — . . .) 

4 

(mvj - . . . )(niUk - - ^Bk. (4.14) 


The expansion of (4.14) is not as difficult as it appears for only two subscripts 
can appear in any term: the explicit appearance of the subscript 3 is due to the 
fact that we arc dealing with a triple summation. We, conscqiumtly, do not 
need to expand those parcntlieses in which B appears. 

We shall now, without any further details, state the final result, which is: 


= n-^{S[nlFl3 - PUPL + 3n,(PJ,P^2 + P^PL) + 3nj(n* + 2)Pl,P[, 

- Z(2n\ + 1)PUP[3 + 3n3P;iP^2P^o + 6(n? + 3nx - 2)PI,P{,PM 

- Zn,SPU[S{n\Pl, + PioPL - n^iPUY + 2P;,P{,)] - n?(SPl\)’]i 
+ 3n-HS[n\{PUB, + P‘,vl.) + 2«(PJ,P{ .P. + P^P!,^.) 

- 2n,iPl,Pl,B, + PUPM - 2n,{PUPl,B, + P^PM 
+ {PUPUB, + Pj.PL^.) - 2n,{PUPioB, + P^PhA) 

+ {PloPLB, + Po‘3PJoA) 1) + 3w-M-SK(Pj,P! + Pu^?) 

+ n,{PioPi,B] + PUPWA\) - {PUPloBl + Pl^PliA]) 

- 2(Pj„B,P{,B, + Pj2>l.P/i^,) + 2n,PUA,B, - 2PJ„B.Pi2^, 

- 2Pii^.pj\p, + 2n,pi,p',A.p. - 2(Pi,)“A.pj! + /i-m-s[(p^„p: + PJ 3 - 4 ?) 

+ 3{Pi^A,Bl + PUAW]\ . (14)>3 


Where a = nj + Wi + 1. 

This formula is shorter and simpler than the formula for 2 MP 21 , although they 
are of the same order. This is due to the symmetry of sAfj'u. 


Chapteb V. Product Moments of Trivariate and Quadrivariate Populations 

1. Some additional definitions and notation. In this chapter we shall indicate 
briefly how the method of the previous chapters may be extended to populations 


’• Cf. Biometrika, Vol. XXI, p. 253, formula (19). 
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of more than two variables. Wc shall do this by deriving some of the simpler 
formulae, corresponding to those of Chapter II, for trivariate and quadrivariate 
populations. 

The notation will be slightly changed in that we shall symbolize the new 
variables by priming the symbols for the variables used in the previous chapteTs. 
Thus, we shall indicate the trivariate population by (A\, X[) and the 

quadrivariate population by Fjt, Xi,f and samples from such 

populations by (x^:, Vk^ ^k) and (x^,, x[., y[) respectively. 

We shall denote by jt product moment of the population of order 
i in Xyj in F, and k in A"', and by PTja/ similar product moment for a (piadri- 
variate population. These are defined by the following ('(piations: 

P", = E{X„. - aJ’(F„ - bJ'iX: - cJ^ (5.11) 

P™ ,, = P(X„ - aJ'(F„. - hJ'iXL - 0*(y.„ - O' (5.12) 

where a^, 5m, etc. are defined as in Chapter I part 2. 

The sample product moments (‘orresponding to P7 ;a*i ^^7;a/ will be denoted 
by pxjk and ptjki respectively. They are defined by: 

p,,^ = n-‘ E {x„ - xyiy„, - y)‘ix:L - a-')\ (5.13) 

m- 1 

p„u = ^ - ^y(y,H - yVi^L -x'fiyi - y'Y . (5.14) 

m— 1 

Finally we shall designate X(ptjk.) and E(ptjki) by ptjk and pxjki respective^ly. 

2. The Mathematical Expectation of pm and /; 2 ii. By definition we have 

pm = E[n-^S{x, — x){y, — y){x\ - x')l. (5.21) 

Applying the transformations (1.17) this equation becomes 

npm = mm, + + C,)] = SE{U,V,U:) + SE{U,V,C,) 

+ SE{UxUiB,) + SE{V,UiA,) + vanishing terms + SE{A,B,CX (5.22) 

Since EAxBiCt = AxBxCx, SE{AxBxCx) = SA xB^Cx. Following the previous 
notation we shall put SA ,B,C» = Qmi- 

When the expression SE(U,V,U[) is expanded, no other non-vanishing terms 
except those of the form E(u,v,u[) = PJn can appear. The coefficient of this 
term will evidently be the same as that of PJi in (2.23), namely: n~^nin 2 . 
Whence : 

SE{U,V,U[) = n-^n^n^SPU, 

The three terms following the first of (5.22) are by (2.24) equal to 

n-‘n,S(P;i„C. + Pio^P. + PUiAX 
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We thus get: 

Pill = ^ ^njn2<SPJ j 1 + n~*n2>S(PIio^'i + ■Pioi'^% + ^oii-^i) + ^~'Qiii • (1*^) 

With the aid of the formulae of II, 3 we easily find the formula 
p212 ^ ^{(^1 l)'SP2ll + (2nJ + 2 00^^011 + 2PJio>SPl01 P 2 00^ 0 1 1 

- 2P;,oP{oi)l + n-^{(nl + l)>S(PJoiP. + PJioC, + 2PU,A,)} 

+ n^l(n\ - DSiPinAl + 2Plo,^A + 2Pi,o^.C. + P^oBA) + Q200SPL, 
+ 2 Qiio>S>PIoi + 2 Qioi^P 110 + QoLiBP 200) } "f" ^ ^Q211 • (Ifi) 


3. The Mathematical Expectation of pnn- The procedun^ for finding the 
formula for pun is very similar to the above. We shall therefore merely state 
the result. 

Pllll = ^ — 1 )>SP nil + ( 2 /i-i + 100^0011 + ^1001^0110 

+ /^loio^^Sioi)} + n-^{(n? + l)S(PliioA + PUoA + P{onB, + PJnA)} 

+ rrWi + DSiPUooCA + PloioBA + pIuoAA + 


"h Poo\\AxB^ + PiQOiBJJ^ + S{QqqiiP 1100 + Qoioi^^ioio + QlOOl^OllO + QlOlO^Ol 


+ QllOO^OOll 


1 ) j n ^Qi 


Washington University, St. Louis. 



AN APPLICATION OF ORTHOGONALIZATION PROCESS TO THE 
THEORY OF LEAST SQUARES 

By Y. K. Wong 


Introduction 


The present paper is an outgrowth of the writer^s attempt to fill a lacuna in the 
discussion of the Gauss method of substitution as given by many writers. For 
illustration, let us cite Briint\s Combination of Observations. In Chapter VI, 
wc find: 

Let the normal equations be 


[aa]x + [ab]y + [ac]z — [aZ] = 0 
[feZ)] 2 / + [hc]z - [hi] = 0 
[cc]z ~ [cl] = 0 

From this equation we find 

_ [a^] „ lof:] , , [aH 

[««] ■ [aa] [aa] ’ 

Substituting, w(3 obtain 

[bh\]i/ + [bcl]z - [hll] = 0 
[ccl]z - [cZl] = 0 


where 


[661] = [66] — [a6] [a6]/[aa], etc. 

From the first equation in (iii), 

„ _ [&cl] ^ , [6/1] 

^ “ [661] [661] • 


(i) 


(ii) 


(iii) 

(iv) 


(v) 


In connection with equations (ii) and (v), the question naturally arises as to 
whether or not these numbers [aa], [661], • • • are all different from zero. Since 
[aa] = 2a ta,, one can see that [aa] 9 ^ 0 if a^ 0 for every i. However, to show 
the non-vanishing of [66.1], [cc.2], etc. is by no means simple. Many writers do 
not give a demonstration on this point. We know that a system of non-homo- 
geneous linear equations has a solution if the system of equations is linearly 
independent. Brunt gives a dis(aission of the independence of the normal equa- 
tions in Chapter V, Art. 36, but he does not state clearly a condition for inde- 
pendence. He says: ‘^The condition of independence is in general satisfied in 

5:i 
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the problems which aris(* in practice. We can them proceed to the formation 
and solution of th(^ normal equations.^’ It is one of the aims of this paper to 
giv(' a necessary and sufficient condition for the independence of the normal 
equations and to show [aa], [56.1], etc. are all different from zero when the condi- 
tion is satisfied. 

In the theory of least squan^s, th(Te is the classical method of the derivation of 
normal equations by an application of the notion of minimum in differential 
cakailus. After the normal equal ions are secured, the Gauss method of substi- 
tution is applied to obtain the solution. Doolittle modifies the Gauss method of 
substitution so as to facilitate the labor of computation. However, when the 
number of parameters (or unknowns) exceeds 4, Doolittle^s method is quite 
complicated. In the i)rcsent paper the writer wishes to present a mathematical 
discussion of a method obtained through an appli(*ation of the Gram-Schmidt 
orthogonalization process. This method furnishes us a new ])rocedure for deter- 
mining the most probable values of the paraimders (or unknowns). The formu- 
lation of the system of normal equations will be omitted in this new i)rocedure, 
which is partuailarly effective in fitting curves to time' s('ries. TIk^ paper can 
be roughly divided into three parts. The first part gives an algebraic derivation 
of the normal equati(ms. The second part d('riv(\s a condition for a set of 
observation data so that the Gauss method of substitution is aj)plicable. 41ie 
third })art giv('s a relationshij) between tlu* Gauss method of substitution and the 
orthogonalization process. A practical application of the results of this paj)er 
will be found in a lat(‘r paper. 

The proc(\ss of orthogonalization has becai used in tlu^ 19th century, and has 
been applied extiaisively in the theory of integral (‘quations and linear trans- 
formations in Hilbert si)acc. In classical analysis, if (^i(x), v? 2 (^), • • • , defined 
on (0, 1), is a normally orthogonalized system, and if f{x)y defined on (0, 1), is 
such that/2 is Lebesgue integrable, then the system of Fourier coefficients 

/r = ^ f{x)<pAx)dx (r = 1, 2, • • •) 

has certain interesting i)ropcrties, one of which is that 

— f (fM - ^ /r'Pr)"'* = 0 . 

ffl Jo 

The preceding notion has a close connection with the theory of least squares as 
outlined in many texts on statistics. In section III, the reader will find how 
this notion is applied in the derivation of the normal equations. Since the 
number of dimensions is finite, the integration process reduces to a summation 
process and furthermore no limiting process is used. This new deriv^ation of 
normal equations has the advantage that (1) differential calculus is not used, 
(2) a new form of normal equations is obtained, (3) the solution of the unknowns 
or parameters can be immediately obtained without further application of the 
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Gaiiss Method of Substitution or the Doolittle Method, and (4) the formula for 
the “cjuadratic residuar^ is obtained as a simple corollary. 

From the results in section III, we see immediately what condition should be 
imposed upon the set of observation data so that the Gauss method of substitu- 
tion may be applicable. In section VI, we find a necessary and sufficient condi- 
tion for the independence of the system of normal equations (3.9), and also the 
fact that when this condition is fulfilled, then, due to the special nature of the 
coefficients of the unknowns, we see that the matrix is properly positive. It is 
on account of this fact that we are able to show that the numbers [aa], [66.1], etc. 
are all different from zero. The demonstration of this point is found in section 
VII. In this section, we lay down a fundamental hypothesis for Gausses method 
of substitution, namely, the set of observations .4, = • • • , a,„) i — 1, 

2, • • • , r, is linearly inde])endent. Lemma 7.3 may be called the fundamental 
lemma for Gauss\s method of substitution. Some interesting properties of the 
numbers [Af,At'h]^ where ,s*, / = 1, • • • , r, and h is less than the smaller one of 
{Sy t), are demonstrated. 

From the properties of the numbers [A^At^h]^ wh(‘re .s’, ^ = 1, • • • , r and h is 
less than the smaller one of (.s, t)^ and in comparison of the system of equa- 
tions (3.7®) with the final form of eepiations obtained through the application 
of the Gauss method of substitution, W(‘ (*an see the relationship between the 
Gauss method and the Gram-Schrnidt orthogonalization process. If we should 
like to giv(^ some credit to Gauss, we may say that tlu^ orthogonalization proc- 
ess was known by him, but was stated in a different form. 

The writer wishes to remark that certain theorems tog(dIier with proofs in 
section II, IV, V and VI are obtained from E. H. Moore^s IcTdiire notes. How- 
ever the writer should be responsible for any defect. Finally, I should empha- 
size that the use of the notion of positive matrices is only for convenience. 

I. Vectors, Inner Products, and Linear Independence 

In this paper, we shall consider vectors of the forin^ 

(1.10) (Vly V2y y Vn). 

For convenience, we shall use capital letters to denote vectors of the type 

( 1 . 10 ) . 

Let V = (vi, V 2 , • • • , Vn) and U = (?/i, 1 ^ 2 , • • • , ^^n), then we say V = U 
Vt = Ui for every i. 

We define V + U hy 

(1.11) y = (t^l + V2 + U2y • • • } Vn -jr ^n) , 
and sVy where s is a number, by 

(1.12) SV = (sViy SV2y • • • , SVn) . 

‘ If we write Vt as v(i)j where 2 =* 1, 2, • • • , n, then v(i) may be considered as a funetion 
of one variable whose range consists of a set of positive integers, (1, 2, • • • , n). E. 11. 
Moore defines a vector as a function of one variable. 
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Hence, sV = Vs. In particular, when s = —1. we shall put —F = (— 1)F. 
Then U - V becomes a special instance of (1.11) and (1.12). 

From (1.11) and (1.12), we see that addition is commutative and associative. 
Innku Pkoducts: The inner product of two vectors V = (z^i, • • • , Vn ) and 
U = (wi, • • • , Un ) is defined- to be 

(1.2) (V,U) = 'Zv^u,. 

1 

The norm of a vector V is defined by n(F) = (F, F); and the modulus of a 
vector F is defined by mod (F) = + \/n(V) . 

From (1.11), (1.12), and (1.2), we can easily prove the following theorem: 
Tiikorkm 1. The symbol ( , ) has the following properties: 

(S) (U, V) = (F, U) for every F, V) (symmetri(‘ i)roperty) 

(La) (.sF, U) = s{V, U) = (F, sU) for every F, U and every number s; 

(L-f) (f/, (F + IF)) = (17, F) + (17, IF) for every U, F, IF; {linear property) 
(P) (F, F) ^ 0 for every F; {positive property) 

(Po) (F, F) = 0 ami only 'if V is a zero vector; {properly positive property) 
Linear Independence. A set of vectors Fi, • •• , Fr is said to be linearly 
dependent in case there exist constants ci, • • • ,0 not all equal to 0 such that 

ClFi CrVr = 0 , 

where 0 is a zero vector. 

A set of vectors Fi, • • , Fr is said to be linearly independent in case, if the 
constants Ci, • • • , Cr satisfy 


ClFi -|- . . . -j- CrV r = 0 , 

each constant Ct = 0 . 

^rHEOREM 2. If the set Fi, • • , Fr is linearly independent^ then none of the 
vectors is a zero vector ^ and hence the norm of every vector must be different from zero. 

For if Vg is a zero vector, then set = 1, and c* = 0 for ^ 5 *^ s. It is obvious 
that 

O.Fi + . .. + O-F, 1 + I Vg + 0.^,1 + . .. + O-Fr = 0, 

which show that the set of v('ctors Fi, • • • , Fr is linearly dependent, contra- 
dictory to the hypothesis. 

A more general theonun is stated in 

Theorem 3. If the set Fi, • • • , Fr is linearly independent, then every subset^ 
is also linearly independent. 

Wo shall prove this th(’orem by a contrapositive form. The contrapositive 
form is as follows: If in the set Fi, • • • , Fr, there exists a subset which is linearly 

Tho notation ( , ) was introduced by D Hilbert. In treatises on least squares, the 
notation [ 1 is used. The present writer reserves the latter notation for other purposes. 

3 Consider a set of integers (1,2, • • • , n) Then any combination of this set of n distinct 
integers taken r ^ n at a time is called a subset of the set (1,2, • • • , n). Likewise, we call 
any combination of the set of vectors Vi, Fo, • * • , Vn taken r ^ n at a time a subset of the 
whole set. 
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dependent j then the whole set is also linearly dependent. Without losing any 
generality, let us suppose the subset Fj, • • • , F* (s ^ r) to be linearly depend- 
ent. Then there exist ci, • • • , such that 

ciFi CgV s = 0 . 

If s = r, then the whoh' set is linearly dependent. If < r, then let Ct = 0 
for z = 5 — 1, s — 2, • • • , r. Then 

Z c,V, = 0, 

1 

Avhich shows the whole set is liiu'arly dependent. 

Theorem 4.^ A necessary and sufficient condition for the set F, = (c,i, • • • , et„), 
i = 1, ... ^r to be linearly independent is that there exists a non-vanishing deter- 
minant of order r in the array 

PlO ^ 12 } ' * * > 

^22) * • • > ^2n 


^rl) * ’ ’ ) ^rn 

11. Gram-Schmidt’s Orthogonalization Process 

For the presc'ut section and the seepud, wo shall adopt the notation A » = 
(ati, • • • , ar„)j Bx = (5,1, • • • , 6,„), and Cx = (r,i, • • • , c,„) for z = 1, 2, • • • , r. 

Theorem 5. For every set of vectors A^ • • • , i4r, there exists uniquely a set of 
vectors B\, • • • , Br such that 

5.1) {Bt, Bf) - 0 ^ s). 

5.2) For every t satisfying the relaiioji 1 ^ t ^ r^ then At is a linear combina- 
tion of Biy • • • y Bt] and Bt is a linear combination of A i, • • • y At. 

5.3) Bi = Ai] and for t > 1, {Bt — At) is a linear combination of 
B\y • • • y Bt^ iy and is also a linear combination of A ly • • • , 

5.4) If t > ly then {A^, Bt) = 0 for every s < t. 

5.5) (Aty Bt) = {Bt, Bt) = {Biy At) for every t. 

To prove this theorem, let us define 

Bl = Al y 

B2 = -^12 if n{Bi) = 0 

( 2 - 1 ) 

B, = A, - Z ht,Bi (1 gr), 


‘ See Dickson, Modern Algebraic Theories, p. 55; Bocher, Higher Algebra, p. 36. 
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whore 

( 2 . 11 ) 


ht, = {At, B,)/n{B,) , if n(B,) 7^ 0 , 


= 0 , 


if n(B,) = 0 . 


We proceed to show that this set has the properties stated in the theorem. 

To prove 5.1), lot ns suj)pose t < s. This assumption is permissible since the 
operator ( , ) has the symmetric property. First, if Ai = 0, then Bi = 0, and 


(BuB^) = (di, ds) = (0, dj) = 0. 


Secondly, if di 0, then Bi 7^ 0 and 


{Bi, Bi) = (di, Ai - hi, Bi) = (di, Ai) - (di, Bi) 


(Ai, Bi) 
n(B.) 


= (At, Ai) - (At, At) (Ai, At)/n(At) = 0 . 


Assume 5.1) is true for t = s — 1, then 

(B„ B.) = (st, A. - £ h.,B^ = (B„ A.) - £ h.,(Bt, B.) . 

The sum on the right hand side reduces to h,t{Bty /?<), since the other terms 
vanish by assumption. Now if Bi) 9^ 0 then by (2.11), Bi) = Bi)^ 

and by the symmetric property of ( , ), we obtain 


B.) = A,) - (A., B,) = 0 . 


If {Btj Bi) = 0, then by the Po-property of ( , ), we find that Bt is a zero 
vector, and hence (P^, B^) = 0. 

5.2) follows from the definition of Bt. 

That {At — Bi) is a linear combination of Pi, • • • , P^-i for / > 1 follows 
from the definition of Bt. Since P, is a linear combination of {A\y • • • , /l,_i), 
we secure the second part of 5.3). 

s 

By 5.2), we can determine such that /!« = ^ <7«P». Thus for every 

1 

s < ty we have by 5.1) 

(d„ ^<) = (e ««) = E g«(B„ Bt) = .0 

t - 1 

By 5.3), there exist gtx such that At — Bt — ^ S^<tPi and hence Af = P< + 

1 

^ (7,/P». Thus by 5.1), we have 
1 

(d„ Bt) = {st + E si) = (Bt, Bt) + £ gt,(B„ Bt) 

= (Bt, Bt) . 

By the symmetric property of ( , ), we secure (/If, Bt) = (Pf, P<). 
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For the proof of iinkiucnoss, let us suppose there exists ii s(H*oiid set of vectors 
having the properties 5.1), 5.2), 5.3), 5.4), and 5.5). By 5.3), we 
see that Bi Ai = B[. Assuming the uniqueness holds true for r = /, we 
proceed to show that it is also true for r = ^ + 1. By 5.3) there exist con- 
stants 5*, (z = 1, • • • , <) such that 

t 

Bt^\ = Ai-fi + ^ .StAt 

B — At^i A- Sidt. 

I 

Tims 

1 

From this, we secure 

(B<+i - - «<+i) = (fim - 1, E («. - s'.)a) 

= E («. - - B[^x, A,) = i), 

1 

by virtue of 5.4). Hence by Po-property of ( , ), we have Bt^^ -- B\^^ = 0 
and hence Bi^i = B\^^, 

The set Pi, • • • , Pr with the properties stated in Theorem 5 is called the 
orthogonalized set of /1i, • • , Ar, This process is calk'd Oram-Schmidt\s orthog- 

onalization process. 

The set Bi, • • • , Pr is called the iiorinally orthogonalized set of Ai^ • • ■ , /Ir in 
case the former set enjoys the properties 5.1), 5.2), 5.3), 5.4), and if 

5.5n) {At, Bt) = (P/, Bt) = (P^, At) = \ for every t . 

Theorem 6. If a subset A ,,^, • • • , ^a-,«(1 ^ hi S • • • ^ ^ r) in the set 
Ai, •••, Ar, is linearly independent, then there is a subset P/.^, • • • , P^-„J tvhich 
has the properties stated in Theorem 5, and it is also lin^arli/ independent. 

Let h — km — k\ A- To prove the theorem, we may assume k\, • ■ • , km to 
be 1, • • • , /i ^ r, for otherwise, we may renumbc'r the vectors. We construct 
the P vectors in the same way as given in equation (2.1) and (2.11). By 
Theorem 5, we have 

(2.2) Bx = Au B. = ^. + E (« = 2, 

I 

Supi)ose the constants ci, • • • , c/, be such that 


CiPi ChBfi = 0 . 
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Then by (2.2), wc secure 

h /t / s-i 

0 — CiAi + CgB^ = C\A\ -j- X] Ca I /Is (7*1^1 

2 2 \ 1 

= (ci ^ 2 ^ 721 , + * * • + ('hOh^ + (^'2 + ^3f732 + * * * 0/(7/<2)^2 “h 

Since /1i, • • • , Ah are linearly independent, we have 


Ci — 6*2r/2i - ... - Chghi = ) 


(2.3) 


C2 — • • • — Ca{7;,2 = 0 , 


+ ChAh 


n, = 0 . 

But the dcderininant of the coefficients of c,(? = 1, • • . , /O 

1 O'lX f73i • • • (JhX 

0 1 ^32 • * * (7/»2 


0 0 0 1 


ITenco by a theorem in tlu' theory of e(iuations,‘^ tli(‘ only solution that satisfies 
(2.3) is that k\ = = • • • = kh = 0. Thus th(' subset Bi, • • • , Bh is linearly 

independ(‘nt. 

CoHOLLAiiY. The orthogonalized set /?i, • • • , Br is linearly independent if and 
only if the set A\j • • • , is linearly imiependent. 

Theorem 7a. If a set of vectors Aij ... , /1r is linearly independent^ then the 
set can he normally orthogonalized. 

Let be the orthogonalized set of A ,. Since /I ^ is a linearly independent set, 
then the set B^ is also linearly independent by Theorem 6. Hence by Theorem 
2, iho norm of e\(*ry vector B^ is non- vanishing. Define Ct = By/mod (B,). 
Then this set Ct enjoys the properties 5.1), 5.2), 5.3), 5.4) and 5.5n). 

Theorem 7b. If a set of vectors, Ti, ... , Vr is normally orthogonal, i.e. if 


(2.4) 


(F., V,) 


1 1 (i = j) 
jo (t 5^ j) , 


then Vi, • • • ,V, is linearly independent. 

For supi)ose 

CiVi -f" • • • + CrV r = 0 . 

Then 


Z c.(V., Fy) = 0 , 


0 = 1,2, ••• ,r) 


® Dickson, First Course in the Theory of Equations (1922), p. 119. 
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By condition (2.4), the i)re(*eding expression reduces to 

Cj = 0, (j == 1, 2, • . • , r) , 

which shows the linear independence of Fi, • • • , Fr. 

III. Algebraic Derivation of the Normal Equations 

Consider a linear function 

r 

(3.1) I = piXi + T)2.r» p,Xr = X) ■ 

1 

Let the set of observations of and I be 

(8.2) == (n.i, • • • , au,)y = (^u • * * , L) (i = 1, • • •, r ; ^ r) 

respectively, then the residual Vt is 

r 

V, = PiOj, — h, a, = l, ■ ■ ■ , n) . 

In vector notation, 

V ^ f: p,Aj - L . 

The theory of least scpiares r('(|uires us to find th(' values for pi, • • • , pr so as to 
make (F, V) a niiniinum, or 

(3.3°) ~ minimum. 

Let yli, • • • , Ar be linearly independent. By Theorem 7, the vectors Ai, • • • , 
can be normally orthogonalizcnl. Let Ci, • • • , Cr be the normally orthog- 
onal set. Tluai ('very At {t = 1, • • • , r) is (expressible as a linear combination 
of Cl, • • • , Ct. Let us write 

(3 3) Em, = 

1 1 

Our problem now is ecpiivalent to that of finding the values ki{i = 1, • • • , r) so as 
to render the inner product 

(3.4) (£ k,C, - L, E - L) 

a minimum. Expression (3.4) can be written in the form 

(L, L) - 2 E(4 C,)ki + E (fcA, k,C,) 

i,J 

(3.5) = (L, L) - 2 E (A C,)K + E fc' 

= (L, L) - E (L, c,y + E (fc.- - iCi, L)y. 

Hence (3.4) gives a minimum if and only if the last summation vanishes, i.e., 

(3.6) A. = (C., L) (t = 1, • • . , r) . 
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The Bessels inequality 

Z ^ {L, L) 

1 

is obtained from (3.6), (3.4), and (3.5). 

To solve for />», we make use of (3.3) and (3.6), and secure 

i: = Y. (c., DC, , 

1 1 

whence 

(c., z = (c*, Z {C„ L) . 

On the right hand side we have 

(c\, Z(C., h)C,) = YiC„ L) (C,, C.) = (C,, L) , 

since (C^, C*) = 0 when i ^ A:, and {Ch, C.) = 1 when i — k. On the left hand 

side, we have 

(ct, z A,p) = Z (C„ A,)v. = Z (C,, A,)p, , 

since {Cky yly) = 0 when j < k. Hence the values for pi, • • • , pr are given by 

(3.7) Z ^C„ A,)p, = (C*, L) (fc = 1, . . . , r) , 

where (C», A,) = (C„ C,) = 1. 

Equations (3.7) are called the normal equations, which arc derived without 
using any notion in differential calculus. 

From (3.6) and (3.5), we secure the value for the ^quadratic residuar (F, V): 

(3.8) iV, V) = (L, L) - Z iL, C,y, 

which is a positive quantity by virtue of the BesseFs inequality. 

Let • • • j Br he an orthogonalizc'd set of Ai, • • • y Ar. Then every vector 
has a non-vanishing norm, and /?* = mod (Bx)-Cx. Hence from (3.7) and 

(3.8) , we have 

(3.7°) Z A,)p. = (B„ L), (fc = 1, 2, . . . , r) , 

t =A 

(3.8°) (F, V) = (L, L) - Z iD B.y/niB ,) . 

Thus we have proved the following 

Theorem 8. Given a linear function (3.1). Lei the set of observations of Xt 
and I be 


At = (a.i, • • • , atn)y L = (Zi, • • • , Zn) (Z = 1, • • • r; n ^ r) 
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respectively. Let Aiy • • • , Ar he linearly independent^ ^ Br he the orlhog- 

onalized set, and C\, • • • , Cr, the normally orthogonalized set of Ai, • • , Ar. 
Then the set of values 7>i, • • • , 7?r will minimize (3.3°) if and only if the system of 

r 

equations (3.7°) or (3.7) holds true; in other words, ^ p^A^ — L is orthogonal 

I 1 

to Cj or to Bjfor every j. The quadratic residual (V, V) is given hy (3.8°) or (3.8). 

From (3.7), we can secure the solution for pi, • • • , Pr immediately without 
further application of the Gauss method of substitution. 

The proof of the following theorem does not make use of the orthogonalizatioii 
process.® 

Theorem 8°. Let F = 2 p^Ai, where every A^ is not a zero vector. The set of 
values pi, • • • , Pr will minimize (3.3°) if and only if {F — //, ^i) = 0 for every 
i, i.e., F — L is orthogonal to every A ». 

The condition is necessary. To prove this, we show that if (F — L, Ax) 9^ 0 
for every i, then we can find another set ■ • • , such that n{F — L) > 
n(G — L), where G = X qxAx. For if (F — L, A x) 9^ 0 for every i, then we can 

find a vector A^ such that {F — L, A 7^ 0. Since A^ 9^ 0, we let e = 

{F — L, As)/n{As) and G = F — eA^ =: X qxAx. Then 

n(G - L) = n(F ^ eA, ~ L) = n(F - L) ~ (F ~ L, A,Y/n{A,) , 

which shows that n(G ~ L) < n {F — L). 

To prove the sufficiency, we show that for every set < 71 , • • • , (/r different from 
pi, • • • , Pr then n{G — />) > n{F — L), where G = S qxAx. Let Sx = qx — piy 

and H = X SxAx- Then G = F -f H. Now if (F — L, ili) = 0 for every i, it 

follows that 


(F - L, II) = i: - L, A,)s. = 0 . 

\ = 1 


Thus 


n(G - L) = n(F - L) + n(//) . 

Since n(II) > 0, we have n(G — L) > n(F — L). 

The preceding theorem does not require the linear independence of the 
vectors Ai, • • • , Ar. By Theorem 7a and 71) we see that it is necessary and 
sufficient for the set Ai, • • • , to be linearly independent in order to solve the 
equations (F — L, Ax) = 0, (z = 1, 2, • • • , r), or 

{Ai, Ai)pi -f- {,Ai, A^P2 + • • • d" {Ai, Ar )pr = (^ 1 , L) 

(3.9) 

{Ar, Ai)pi + {Ar, A2)P2 + * * • + {Ar, i4r)Pr = (^r, L) . 

• The proof is based on the same type of reasoning as used by Jackson. See Dunham 
Jackson’s Theory of Approximation, pp. 151-152. 
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If ill, • • * , ilr are linearly independent, the conelusion in Theorem 8° can be 
deduced from Theorem 8. For by Theorem 7a) At = ^ and hence 

{F - L, ^.) = (¥ - L, E - L, C,) = 0 . 

Also, Theorem 8 can be dcnluccd from Theorem 8°. 


IV. Matrices and Their Reciprocals 

An ordered array of numbers of the form 


(4.1) 




U 2 I, ^22, «2m 

a = (a»,) ~ 


(^n\) ^n2> 


is a matrix. If we write a{i^ j) = a,,, then the array of numbers (4.1) may be 
considered as a function of two variables t, j on the ranges of positive integers 
(1, 2, • • • , n), (1, 2, • • • , 7a) 7 Thus a vector is a special instance of a matrix.. 
We shall use Gn^ek letters to denote matrices throughout this paper unless other- 
wise specified. When ?i = la, i.o. the number of rows is the same as th(‘ number 
of columns, W(» hav(^ a square matrix. Associat'd with every a-row sejuare 
matrix, k, a determinant can be defined, and for simplicity, we shall adoj)t the 
following notation: 


Ull (l\n 

D(k) = 




An identity matrix, denoted by 5 = (r/„), is a s(piare matrix of which the 
elements in the principal diagonal are 1 and elsewhere 0, i.e. cUj — 0 {i ^ j), 
dtx = 1. A zero matrix, indicated by co, is one such that every one of its ele- 
ments is 0. The transposed matrix, a', of a is formed by interchanging the 
rows and columns. We say two matrices a = (a„) and = (?>,,) are equal in 
case a,y = htj for every t, j. A matrix a is symmetric in case a' = a. The 
column of a is indicated by «(., i), the row of jS by /?(t, .) and the element in 
the row and column by a (i, j). Hence a( 2 , j) = Ut,. 

Addition: Let a be a matrix given by (1) and fi = (h^j) a matrix of the same 
number of rows and columns as a. Then 

Ot P = (®t; + bij) . 


’ E. H. Moore defines a matrix as a function of two variables. 
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We note that a + = jS + «. If 7 is a matrix of the same number of rows and 

columns as a, then (a + /?) + 7 = « + (/I + 7). 

Multiplication: Let a = (a*,) be defined by ( 1 ), and ^ = (6;A:) be a matrix 
of m row and r columns, then the product ir — afi is defined by 



Thus TT is a matrix of m rows and r columns. 

The multiplication of two matrices is not necessarily commutative. 

If a is a matrix of n rows and m columns, jti of m rows and r columns, and 7 of 
r rows and 6 * columns, then (x{ 0 y) = (a^)y. If a is a matrix of n rows and m 
columns, and /ii, 7 are matrices of m rows and r columns, then «(/? + 7) = 
a(3 -f- ay. 

Scalar Multiplication: Let s be a number, and « be a matrix of 71 rows and 
771 columns, then 

s-a = (satj) = as. 

Let 5 s denote a scpiare matrix of n rows in which the elements in the principal 
diagonal are s, and 0 elsewhere. Then 5* = 6*5, wluTe 5 is an n row identity 
matrix. We note from the associative law of multiplication that 

sa = 5, • a = a • 5, . 

In particular, let .s = — 1 , then we have —la. For convenience, we write 
— a = —la. From the definition of addition, we obtain a definition of sul)- 
traction for two matrices of the same number of rows and columns. 

Reciprocals of Matrices: Let a be a matrix of n rows and m columns. 
Then a matrix ar^ of m rows and n columns is said to be a reciprocal of a in case 

a-a'~^ = 5", and a"^-a = 5"^, 

where 5", 5^^* are identity matrices of order n, 7 n respectively. If a matrix a has 
a reciprocal a“^, we can i)rove ar^ is unique. It (;an be shown that when a has a 
reciprocal^ it must he a squai'e matrix.^ 

A matrix is said to be non-singular in case it has a reciprocal, otherwise it is 
said to be singular.® It is evident that every zero matrix is singular, and an 
identity matrix is non-singular. 

Suppose a is a square matrix of order n. Let us denote the cofactor of the 
element a*; of a by Then 



is called the adjoint matrix of a. 


‘ F\)r the proof of this statement, see Moore, Vector, Matrices, and Quaternions, 
' This definition is due to E. H. Moore. 
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If a is symmetric*, then e is also symmetrie. Since a»iei, + • • • + atnCr,, = 
D{a) or 0 according as i = j or i ^ we secure the following: 

Theorem 9. Let a he a square matrix and e its adjoint^ then 

«€=€«= [Z)(a)]5 . 

Theorem 10. If the determinant of a is different from zero, then there exists a 
reciprocal a~^, and a = adj a/D(a). 

This theorc^m follows from theorem 5. 

The converse of Theorem 6 is also true. 

V. Symmetric Matrices of Positive Type^° 

Let a = (atj) be a matrix of n rows and m columns; and let o = (ki, ■ • • , kn) 
and p = (hi, • • • , hm) be integers among the sets (1, • • • , n) and (1, • • • , m) 
respectively. The subs(‘ts a and p may be equal to the whole sets (1, • • • , n) 
and (1, • • • , m) respectively. Then 


(^klhi • • * (^k\hn 

(3) a(a, p) = 


^knf'i ^^knhni 


is called a minor of a. In notation we write this minor as of((7, p) indicating the 
ranges to be; o and p. 

The minor a( — o, — p), which is obtained by striking out all the kf^ (^ = 1, 
• • • , m) columns and (j — 1, • • • , m) rows from a, is emailed the com- 
plc^mentary minor of a (o, p). 

If a is a square matrix of order n, them a{o, a) is called a principal minor of a. 

Let a and 0 be matrices of n rows and m columns; and let a, p have the same 
meaning as above. Then a(a, p), p) are called corresponding minors in 
a, /3 respectively. 

A symmetric matrix a = (a.^) of order n is said to be of positive type in case 
the determinant of every principal minor of a is positive, and is said to be of properly 
positive type in case the determinant of every principal minor of a is greater than 
zero. 

Corollary VI. Every element in the principal diagonal of a positive, sym- 
metric matrix is positive. 

For, let o consist of a single integer i, then a{a, o) = axi ^ 0. 

Corollary V2. If a symmetric matrix is properly positive, then every element 
in the principal diagonal is greater than 0. 

Theorem 11. If a symmetric matrix a of order n is {properly) positive, then its 
adjoint matrix c is also symmetric and {properly) positive. 

Wc follow the terminology of E. H. Moore. Moore developed this notion quite 
extensively. 
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The symnK'try of e is evidc'iit. Let o be a subset of (1, • • • , /i) and let p be 
the number of integers in o. C'oiisider any principal minor €(a, a) in the adjoint 
matrix e. By a theonnn in tlu^ theory of determinants, we have^* 

DHa, o)] = -a)].[D(a)rh 

where k is an integer depending on the set a. By hypothesis a is positivf' (prop- 
erly positive); hence /)[«( — <t, —a)] and are positive (greaU'r than 0), 

and it follows that /)[€(a, o)] is positive (greater than 0). 

Theorem 12. If a symmetric matrix is properly positive^ then D(a) is different 
from zero^ ami a has a reciprocal ^t^hich is also symmetric ami properly positive. 

For take a to be the wholes set (1, • • • , n) in the definition of proper positive- 
ness, and we see that D{a) ^ 0. The theorem now follows from Theorems 10 
and 11. 

VI. Gramian Matrices 

In this section, we shall study the matrices of the normal ecpiations (8.9). 
The main r(\sult is that if the set of observations Ai, • • • , is linearly inde- 
pendent, th(*n the matrix (called Gramian matrix) is i)roperly positive and has[a 
reciprocal which is also j)roi)('rly i)ositive. i 

Theorem 13. Let A\^ • • • , A, be a set of vectors^ and let 7^i, ‘ - j IL be the 
orthogonalized set of vectors. Then the matrix 

({A,, .li) ... (.lt,.l.) 

(6.1) f(A,, ... ,.\,) = 

\(.4.,.10 ... (.4., .4,) 

has the following properties: 

13.1) symmetry 

13.2) imA,, . . . , .4.)] = n{lL)n{B^) • • • n{Br), 

13.3) positiveness. 

A matrix of the form (6.1) is called a Gramian matrix. 

In fact, the symmetric property follows from the fact that (.4^, A^) = (^*1^, A,) 
for every 7, j. 

We shall prove 13.2) by induction. For r = 1, we have by Theorem 5 
(yl„ A^) = (B„ «,) = n(«i) . 

Assume the equality is true for r = <, we shall show it is true for r = t 1. 
The {t + l)-row determinant is as follows: 

{Aiy Ai) ... (Aif At) (Aij Ai+i) 

(6.2) DlUAu ■■■ , Ar)] = A,) . ■ • (A„ A,) (A,, A,+,) 

(AijAt^i) . . . (.4/, -4<^4)(i4f.fi, .4 Ml) 

” In case a == (1, . • . , w), —a is a null class A (a class which contains no element); then 
we define D[a(— <r, —a)) ~ 1. For the proof of this theorem, see Bocher, p. 31. 
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By Theorem 5, there exist constants St{i = 1, • • • ,t) such that 

=■ Bui ^ . 

t=i 

Substituting this value into the last row, we find the element in the z*’' column is 
{Atf At-^\) = (-4», -j- ^y-4; I = (Atj Bt^.l) -j- ^ S;(i4t, Aj) 

\ / y 1 

{i = 1, • • • f tj t 1) , 

The second term on the right is a linear combination of the first t elements in the 
column of the determinant (6.2) and hence by the theory of determinants, 
we secure 

(Ar.Ar) (yl„Ay+0 

• • • ’ = 0l„ A,) 

...{A„B,,,) iiw) 

By Thcoremi 5, we find that (A*, Bi^\) = 0 for z = 1, • • , f, and (Ay+i, Bt^.\) 
== (/iy+i, Bt\.\)y and hence the preceding determinant reduces to a form in which 
the first t elements in the (t + 1)^** row are zero. Th\is 

(Ai, Ai) • • • (Ai, Ay) 

DltiAu ...,Ay+0]= -nC/^e^O 

\iA„At) ••• (A„A,)\ 

= ■ ■ ■ ■n(Bi)n{Bt^i) 

which proves 13.2). 

Consider any subset a = (Ari, • • • , km) of the set (1, • • • , r). By the same 
argument as above, we find that the determinant of any principal minor 

(Aa:i> Aic^ • • • (Aa-j, Aicm) 

(6.3) = n(B^^) . . • n{Bkm) • 

{Akmf Aai) • • • (Aa-„„ a km) 

By Theorem 1, the number on the right is positi\ e. Thus the matrix f is 
positive. 

Theorem 14. The following three assertions are equivalent: 

14.1) the set A ly • • • , Ar is linearly iiidepcmient] 

14.2) the Gramian matrix (6.1) is properly positive) 

14.3) The determinant of the Gramian matrix (6.1) is different from zero. 

We shall prove that 14.1) implies 14.2); 14.2) implies 14.3); and 14.3) implies 
14.1). We thus prove the three statements are equivalent. 


Dickson, First Course in the Theory of Equations (1922), p, 113. 
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Let /?!,•••, Br be the orthogonalized set of the set , A r. Since the 

set /li, • • • , is linearly independent, then every subset 

. . . , ^ ^ ^ ^ r) 

is also linearly independent, and hence n(Bk^) > 0 for f = 1, 2, • • • , ?w. By the 
same argument as giv(m in the demonstration of Theorem 11, we find that the 
determinant of any principal minor (6.3) is greater than zero. This i^roves the 
matrix (6.1) is properly positive. 

If the matrix (6.1) is propel ly positive, then by Theorem 10 the determinant 
of (6.1) is different from zero. 

To prove 14.3) implies 14.1), suppose ki(i — 1, • • • , r) are such that 

d 1 -|- • • • A r = 0 . 

Then 

{kiAi -|- * • • -[- dj) = ki{A\y dt) -|~ • • • -f- ^v(dr, di) = 0 

for z = 1, • • • , r. Since (d», d^) = (d„ dt), and D(f) 9 ^ 0, the set of con- 
stants k^ must be all equal to 0.^^ 

From Theorem 14, and Theorem 10, we may state the following 
Corollary: If the set of observations di, • • • , d^ Imearly independent j then 
the Grarnian matrix f has a reciprocal which is properly positive, 

VII, Gauss Method of Substitution 

Lemma 7.1) Let <p = (s„) be an r-row symmetric matrix such that Sn 9 ^ 0. 
Then there exists an r-row square matrix r whose determinant is unity such that 
^ = (r^j) = Tip has the following properties: 

a) rti = 0/or i > 1, and rn = sufor every i\ 

b) the first minor of ru is symmetric; 

c) the determinant of every principal minor in ^ of the form 

Sn S1K2 * * 

( 7 . 1 ) 0 n^ki ■ ■ Tkikn, (2 ■■■ 


^ ^ k2km ' ' ' 1 kmfcrri 

is equal to the determinant of the corresponding principal minor in <p. 

To prove this lemma, let us define 

(7.2) T = b + Fi-Dx, 

where Di is the first row of an r-row identity matrix 5, and Fi(l) = 0, 

F\{n) = -sinAii (n > 1). 

(Thus FiDi is an r-row square matrix in which the first column is Fi and every- 
where else 0.) It is clear that r thus defined is a square matrix of order r, and 


See footnote 5. 
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D{t) — D{b 4 - FiDi) = 1. By miiltii)liration of tliesc* two iiuitricos, we 
obtain a now matrix sucli that rn = Sn, r,] = 0 for i > 1, and /’it = Su for 
every i, and further 

(7.3) r,j == .s’t; - for ?: > 1,^ > 1. 

To prove proj/erty (b), W(‘ note that = .y ,, sinec^ (p is symmetric. Thus for 
i > lyj > 1, we note from 7.3) that 

= fijt — S\jfiu/Sn = Vj,. 

For the proof of tlu^ last |)roperty, we note' that the (;orrespondinji; minor of 
(7.1) in is of the form 

•S‘ll •''I/. 2 • 

(7.4) '''‘U2 * •S/.2/M 

2/ m * * * •*'/. wiA wij 

Since is symmetric, w(‘ have by (7.3), 

ri,kj = Hk.kj — sn^sujun (/ > 1, j > 1), 

0 = Hk^i — Sn^Sn/sn (/ > 1 ). 

Thus by a theonun in the theory of det(‘rminants, the determinants of (7.1) and 

(7.4) are equal. 

Lemma 7.2) Let Kp = (,s’„) (/, j = 1, • • • , r) he a sjpnmeiric matrix of positive 
typCy arul Sn ^ 0. Then there exists an r-row square matrix r whose determinant 
is unity such that — {r,j) — Tip has the properties stated in Lemma 7.1) arul 
furthermore the minor of ru in 7.1) is of positive type. 

To prove the positivc’iiess of th<» minor of ?’ji, l(‘t th(‘ determinant of any one 
of its principal minors be 

rk2k2 ' * ’ ^'k2kvt 

Ml = (2 ^ ^ g A,„ ^ )•) , 

^ k2km ’ ’ * ^ kmkm 

where rk-^kj = I'k^k^ (f, j = 2, • • • , m) due to the symmetry. Now consider the 
bordered determinant 

rn rik2 ru-„ 

M 2 = ^ f^2k2 * ’ * ^'k2km 


/ k2kn, ^ kinkri, 

which by property (a) in Lemma 7.1) gives M 2 = VnMi = suMi. By property 
(c) in Lemma 7.1), M 2 is equal to the determinant of the form (7.4), which by 
hypothesis is positive. Thus SnAfi ^ 0. Since Sn > 0, we conclude that 
Ml = M2A11 ^ 0. 
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Lemma 7.3). Let ip = {s^j) (/, j = 1, 2, • • • , r) be a sj/tn metric matrix of 
properlj/ positive type. Then there exists an r-row square matrix r whose deter- 
minant is unity such that yp = (r*;) = np has the properties stated in Lemma 7.1) 
and furthermore the minor of ru in yp is properly positive. 

Since ip is pro])orly positive, wc find that .s*n > 0. 'I'hi' proof of this l(‘inina is 
similar to that of Lemma 7.2). 

Suppose that the set of observations /li, • • • , Ar is linearly independent. 
Then by Theorem 14, the (iramian matrix (6.1) is symmetric and j)roperly 
positive, and hence (/li, Ai) >0. By Lemma 7.3), the' matrix (6.1) can be 
reduced to the form 


’Ui/li.O] [A,A,^0] UUArA)] 


(7.5) 

0 

[.42/12* 1] 

[yt-ids-l] ••• [/Ld;.-!] 

wh<‘re 

0 

U2.4.-1] 

1.4., . 4.. 1] ... M,.4,.1]J 


UiA,.o] 

= (Ai,Ad = 1 

:.4,.4..0] 


[AcA,.l] 

_ [.4,.l,.0]U,yt,.0l - Ui/lr0]M,.l«*0] 
[Al Al •()] 


It is evident that [i4ii4i-0] = (.4i, Ai) > 0, since the matrix (6.1) is properly 
positive. By Lemma 7,3) the value of D (f) and the determinant of (7.5) are 
equal, and furthermore the minor of the eleiiKuit [i4i/li 0] is a symmetric matrix 
of properly positive type. Thus [A 2 A 2 A] > 0, and [AiAgA] = [At,Afl]. 

The minor of [^ii4i 0] surely satisfies all the conditions in Lemma 7.3). Wc 
may, therefore, apply a transformation of the form (7.2) to the minor of [Aiili-O], 
and secure another matrix of the same (character as (7.5). fn other words, wc 
may multiply on the left of the matrix (7.5) by 

(7.6) r2 = 5 + F 2 D 2 

where D 2 is the second row of the r row identity matrix 5, and 

F,(n) =0 (n g 2); FM = (« > 2) . 

\A2A2 • IJ 

In general, let 

(7.7) = h^l\D, {i =1, ,r - 1;, 

where Dx is the i^^ row of the r row identity matrix 5, and 

F.(n) =0 (n g i) ; F.(n) = - “ J] 


(7.8) 


(w > i) . 
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Continuous application of this typ(i of transformation ultimately reduces the 
matrix (6.1) to the form 


AiAi-0] Ui^ 2 - 0 ] U 1 A 3 -O] . . . [AiAr-0] 

0 [A2A2^1][A2A,A] ... [A2ArA] 

0 0 [A,A,.2] . . . [A,Ar-2] 

0 0 0 [ArAr-r-\]\ 

. . [AnAh-h — 1] [AtA,h-\\ - [AhAfh — 1 ] [AnA.-h 1] 

AtAg • a] = 1/4/1111 

[AhAtrh - 1] 

(^, = 1, . . . , r; ^ h ^ sm{tj 

In the matrix (7.9), we see by virtue of Lemma 7.3) that — 1 ] > 0 for 

every i, and [AtA^-h] = [A^A^h] for every s, t and 0 g /i ^ sm(tj s). If 
h = sm(iy s)y then [AiA^’h\ = 0 . 

Let r = Tr-i-Tr -2 *•* 'Ti, Tlicii by tlic associative law of multiplication of 
matrices, we sec that 

(7.10) 7 / = (Tr~l • • • ri)f = rf. 


(7.9) 

where 

(7.9i) 


Thus we prove 

Thkokem 15. If the set of vectors Aij ... ^ Ar is linearly indeperulenty then there 
exists a square matrix r of order such that rf is of the form (7.9) where all ele- 
ments below the principal diagonal are 0; every element in the principal diagonal 
— 1] {i = 1, ... , r), is greater than zero; and [AtAg-h] = [AgAfh] for 
Sj t = If • • • y Vj and h < sm{t, s). Furthermore the detemiinants of the matrices 
(6.1) and (7.9) are equal. 

We now pr(3ve the following lemma which will be useful in the later section. 
Lemma 7.4). If[AxAt’i — 1 ] is different from zero for every i ^ 0 , then for 
every pair of integers (*•, <), v)here ^ = 1, * . . , r, and n ^ smify &’)> have 

n — I 

a) [A,A..n] = (A„ A.) - g E [ ] ~ • 

b) [At{A, + = {AtA,-n\ + [A,A„-n] , (u = 1, • • • , r) . 

c) [{cAt)A,-n\ = c[A(A,-rt], (c = a constant) . 


To prove a), take every pair (s, t). We find the lemma is true for n = 0. 
Assuming it is true for every {Sy t) and for n = h < sm{sy t)y we find that + 1 ^ 
sm{sy t)y and 


(A„ A,) 


h A- 1 

S 


» = 1 


{AAfi — 1 ] 

[A.A.-j — 1] 


[AtA. * i 


1 ] 


sm (5, 0 read “the smaller one of (/, jk” 
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[AAt'i — 
[AAn- 


IJ 

1 ] 


— 1 ] 


[/I h \ \Ai • h] 

^[ATAn-h] 


[Ah+iAa-h] 


= [A,A,-h] 


[Ah+iAt-h] 

[AhAh‘h] 


a • h\ 


[A tA a'h + 1] , 


for every s, t. 

Parts b) and e) arc true for n = 0. Now make use of tlu' ecjuality in a) and 
prove by induction. 


VIII. Gausses Method of Substitution and its Relation to Gramian Schmidt’s 

Orthogonalization Process 

I.ct us write the set of observations in the form : 


(dw «12 (luA 


\Ofl (lf2 * * * ^rn/ 

Let the orthogonalized set also be writte^n in the form 

/hi\ • • • 2>ln\ 

From TheoreniwS 5 and 6, we find that there exists a transformation k given by an 
r-row square matrix such that /3 = #ca. Thus by the associative' law of multi- 
jilicatioii of matrices, we have 

/5 q:' = (K«)a' = K(aa') . 

Now the matrix aa is the Gramian matrix (6.1). Thus 

(8.1) /3a' = Kr. 

The composite matrix /3a' is of the form 

’{BuAMBi^A^) ... (i3i, Ar) 

. (^2, A,){Ih, ^ 2 ) • • • (^2, Ar) 

( 8 . 2 ) 


l(Br, AMBr, A2) ••• {Br, AA_\ 

By Theorems 5 and 6, we note that (/?„ /!<) = 0 for s > t, and (B,, A a) = 
(Bay Ba) for every s. Thus the preceding matrix can be written in the form 


(8.3) 


■(fli, B,)(Bu A^)(B,y ^3) • • • (Bi, Ar) 
0 (B 2 , B^(B2y -/la) • • • {B2i 


0 


0 


0 ... {Br, Br)J 
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Wc have proved the following theorem: 

Theorem 16. Let /li, ••• y Ar he a set of vectors y arid B\y • • • , Br be the 

orthogonalized set; and, let a = (at,), Then there exists a square r-row 

matrix k such that = Aca, and Kaa' is a matrix of the form (8.3) where all the 
elements below the principal diagonal are zeros and every element in the principal 
diagonal is positive. If the set Aiy • • • y Ar is linearly independent y then every 
element in the principal diagonal is greater than zero. 

TfFEOREM 17. Let Aiy • • • y Ar he a set of vectors and Biy • • • , B, be the 

orthogonalized set; and let a = (a*,), = (6*,). Then = D{aa'). 

For by ecpuitions (2.1), we note that D{k) = 1. Thus 

/)(/3a') = /l(Aca«') = D(K)Diaa') = D(aa') . 

Theorem 18. If the set of vectors y /li, • • • , /I, is linearly independent y the 
matrix k arising from GramSchmidV s orthogonalization process is identical with 
the matrix r defined by (7.10). 

To prove this theorem, we first establish the following 

Lemma 8.5): If the set Aiy • • • y Ar be linearly independenty and Biy • ‘ , Br he 

the orthogonalized sety then for every ty A, we have 

(B,y At) = [AnAtGi ~ 1] . 


By Theorem 10, the set By, is linearly independent, and hence n{Bt) > 0 for 
every i. The lemma is evidcnitly true for every t and /i = 1. Assuming it is true 
for (ivery t and h = .s, we shall prove it is also true for every t and h = s — 
Now 


iL-fi 


A 


^ (A,y By) 

U (/A, By) 


By — Ag^l 


S [AyAy,4 - 1 ] . 
[AyAri - 1] ' 


Thus by the linear property of ( , ) we secure, for every t 


{Bg^iy At) 



lAyA,4^ 
lAyAy-i - 1 ] 



= (A, 41 , At) — 

t=i 


[AyAs -i - 1] 

[AyAy^i - 1] 


(Byy At) 


= At) — ^ y 

t=“l 


[AyA,-i - 1 ] 
[AyAy-i - 1] 


[AiA^ • i 


1 ] 


= [As-^iAfs] 


by virtue of lemma 4.4). 

From this lemma, we conclude at once that the matrices (7.9) and (8.3) are 
equal. Thus by (8.1), wc have 


ACS' = = rf, or (ac — r)r = cij . 
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Since f is non-singular (by Theorem 12), we have 

w = (k — = (/c — r)5 = K — T, 

which proves the theorem. 

From Lemma 8.5), we have 

Lemma 8.6). Let L — (Zi, ••• , /„). Suppose the set Ai, ••• , Ar to be 
linearly inde})enclent, and , Br to be the orthogonalized set. Then for 

every 

[A,LAi ~ 1] = L). 

Theorems 16, 17, and 18 furnish us a new method for finding the most prob- 
able values of the unknowns in the theory of k^ast s(puires. The formulation of 
the system of normal equations may be omitted in this new prot*(‘dure, which 
may be des(*ribed briefly as follows: Aftc^r we obtain a set of observations 
Au ••• , we orthogonalize this set by means of riram-SchmidFs procf'ss. 
Let L be a non-zero vector. The product 

/6ii • • • l)in\ A^ll * * • — h\ 

\J)jI • • • 5rn/ \Rln * * * f^rny ^n/ 

will give us the result as desired by (lauss\s method of substitution. 

Academia 8inica, 

Peiping, China. 



A NOTE ON THE ANALYSIS OF VARIANCE^ 


B\ Solomon Kullback 

By considorinp; ii sot of iiHlopcndoni items classified in some relevant manner 
into N sets of s itcans ('aeli, and by the use of a dispersion theorem of Prof. J. L. 
Coolidge," Prof. II. 1^. Rietz*^ arrives at estimates of variance, used by Dr. R. A. 
Fisher, without making us(‘ of argunuaits involving the number of df'grees of 
freedom of the items cone(‘rn(‘d. 

By procei'ding along tlie liiu's follow('d by (V)olidge and Rietz but considering 
a set of independent items classified into N sets of .s,(f = 1,2, • • • , A^) items 
each, we shall arrives at ccudain other important results of R. A. Fisher"* in his 
analysis of variance. 

The theorem rc'h'rn'd to above' is as follows: If independe'iit (quantities 
l/ij • • • , l/n be given, tlu'ir ('xiK'ct(Ml vahu's Ix'ing Uj, Uo, • • • , tin, while the 
expected value's of their sepiares are Ai, A 2 , • • • , An, resqu'ctively, and if wc 

n n 

agre'C to set y = (l/n) 22 //»» ^ theui the (‘xq)ected value of the 

I --- 1 t - 1 

n 

vnrianco, (!/«) (.'/> — U)' 

^ ~ ^ (a. - • 

l 1 t = 1 

Sui^I)ose a sed- of indei)end('iit items has be'en classifie'd in some re'levant man- 
ner into N s('ts of s, (i — 1,2, • • • , AT) items e'aedi as follows: 

J'n, -^12, * • ’ , Xisj, J’l 

•^2U •*'22j * • * j •^2».>> -^2 

( 2 ) 


( 1 ) 


Xn 2 , • • • , X\g^j .r.v 

X 

where Xx(i = 1,2, • •• , N) is the arithmetic mean of the set and .f the mean 

of the j)ooled samq)le of .s = + S 2 + • • • + S\ items. 

We shall assume that the set (2) is statistically homogen(H)us in the sense that, 

^ Presented to the American Mathematical Society, February 23, 1935. 

* Bulletin Am. Math. Soc., Vol. 27 (1921) p. 439. 

3 Bulletin Am. Math. Soc., Vol. ;38 (1932) pp. 731-735. 

< Proceedings of the International Math. Congress, Toronto, 1924, Vol. 2, p. 802 ff. 
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using E { ) for the exj)eeled value of the expression in tlu' panuithesis, we may 

let K{x„) = a, E{x\,) = .4, (« = 1, 2, • - • , iV, j = 1, 2, • • • , s,). 

Then, using (1) 


(3) 


2 ) = (s. - 1) - (i^) . 


\;-l 


Summing (3) from j = 1 to N, we have 


(4) E 


( 2 - «-) S 

\i = i. j = i / 1=1 


Similarly, by iisiiijj; (1) 


(5) 

But® 

( 6 ) 

(7) 

( 8 ) 


E 


s.O", — .r)^ 


N - 1 

■■ N 


2 «. [7i’(x^) - a^] . 


E(x]) — z= E(j\ — a)2, and 
E(J\ — a)'^ = (d — a‘’)/,s*„ tlK'n^forn 


7^; 


Xr. - .r)M = (iV - l)(d - 


Similarly by using (1) 
(9) E 


'(^ S ~ ^ • 

'’riiiis, ill a statistically homogeneous set of it(*ms, (*lassifi(‘d as in (2), the fol- 
lowing (estimates of Variance have the same expected value: 



F- 

where 

1 

II 

(10) 

y _ 

' s - N’ 

where 

V, a 1 

St = ^ (x„ - .?,)2 

1 = 1 .7 = 1 


V 

N - i’ 

where 

,V 

Si = '^ s,(x, - x)2 . 


1 = 1 


These estimates are used in applying the analysis of variance to the study of 
the correlation ratio, tj, for uncorrelated material, where r;- = Sx/S. 


Office of the Chief Signal Officer, 
Washington, D. C. 


® Rietz, H. L., loc. cit. p. 733. 



A PROBLEM INVOLVING THE LEXIS THEORY OF DISPERSION 

By Waltku a. Hendkicks 

The attention of the author was reec'ntly dinn-ted to a study of the hatch- 
ability of (diicken ('ggs at the U. 8 . Animal Husbandry Experiment Station, 
Beltsville, Maryland. It was necessary to find the ava^rage hatchability of th(' 
fertile eggs incubated for (^ach of a number of lots of birds and tlu^ corn^sponding 
standard errors of those averages. 

It was very apparent that some mcdliods for com])uting such values, in com- 
mon use at the i)res(Mit tim(% do not give satisfactory rc^sults. This is due to tlu‘ 
fact that the fertile eggs produccHl by different birds vary considerably with 
respect to hatchability as well as with respeed to number e)f eggs available* for 
incubation. It seems reasemable to suppose* that the variability in hate‘h- 
ability e)f a number of fertile eggs, pre)due*e‘el by a given numbe'r of birds, she)ulel 
obey the Lexis law e)f elispersioei. This supi)e>siti()n is base*d em twe> hype)the'se*s : 

(a) The i)re)bability that a fertile* e*gg will hate*h is constant lor all fertile eggs 
pre)ducoel by the* same* hire! during the* time inte*rval under ce)nsideration. 

(b) The ])robability that a fe*rtile e*gg will hatch varies fre)m bird to birel. 

The reader familiar with the^ princii)le‘S e)f genetie*s may epiestiem the valielity 

of the first e)f these* hy])otheses. 'Fhe* pre)bability that a fertile egg will hatch is 
largely governed by the* genes e*arrie*d by the* e*hrome)se)meiS e)f the e)vuni e)f the 
hen and the spe*rm of the male birel which fertilize*el that ovum. The kinds e)f 
genes carric*ei by various e)va and spe*rmate)ze)a are* not ne*cessarily the* same, even 
when those e)va anel spermate)zoa are j)re)eluced by the^ same female* and male 
birds, respectively. However, if wei have a sample e)f a number of fe*rtile eggs 
pre)eiuced by the stime he*n, we are justified in assuming that the pre)portion of 
those e*ggs whie;h will hatch is constant, e*xe*e*pt fer sampling nuctuations, whe*n 
successive samples of fe*rtile eggs produced by the given he*n are incubated, pro- 
vided, of course, that tlu* eggs in the successive sanipl(*s w(*re fertilized by the 
same male bird or birds. The limit approa(died by the proportion of fertile eggs 
which hatch as tlu* nunib(*r of f(*rtile eggs produced by the given hen be(H)mes 
infinitely large may be defined as the probability that a fertile egg produced by 
that hen will hatch. It will be recognized that this definition is based on purely 
academic considerations, sinc(* there are physical limitations to the number of 
fertile eggs which a hen can produce in a given p(*riod of time. Hypotheses (a) 
and (b) are to be interpreted in the light of this definition of the probability that 
a fertile egg produced by a gi\'en bird will hatch. 

Let 5i, S2y • • • Sn represent tin* numbers of fertile eggs produced by 7 i birds 
during a period of time and let /i, /2, • • • respectively, repres(*nt the numbers 
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of chicks obtained from those eggs when the eggs arc incubated. Let pk = It 

represent the hatchability of the fertile eggs produ(;ed by the bird. 

The squared standard error of pk is given by the Jjcxis formula:^ 


<7 


2 

l*k 


PQ ^ s,-l 

Kk n.% 


2 - py 

t =* 1 


(1) 


in which tlie Pt represent the respective probabilities that the fertile eggs pro- 
duced by the n birds will hat(;h, P is the arithmetic iiK^an of the Pi, and Q is 
equal to 1 — P. 

The values of the probabilities, Pi, are not known. However, as a first 
approximation to ecpiation (1) we may write: 


a 


2 _ 
Pk ~ 


m ^ Sfc - 1 
Sk nsk 


s 

I — 1 


ivt - vy 


(2) 


in which p is the arithmetk; mean of tlui pt and q is equal to 1 — p. 

The product, pq, can be accepted as a reasonably close approximation to the 

n 

product, PQ, but the expression, ^ (p/ -- p)'^, will, in general, be greater than 

n 

the expression, ^ (Pt — P)“. The reason for this is apparent when we con- 

sider that if each of these two expressions is divided by n, the former yields an 
estimate of the squared standard deviation of the pt while the latter yields an 
estimate of the scpiared standard deviation of the Pt. The standard deviation of 
the Pt will, in general, be greater than that of the Pt because thei pt anj more or 
less imperfect estimate's of tlu^ Pt and are, therefore, subject to sampling errors 
from which the Pt are free. 

We may write : 

- 2 - 2 - py - ( 3 ) 

n ^ n 

t^i / = 1 

in which al is an appropriate correction as yet undefined. 

Since the pi would approach the Pt as statistical limits if each of the Si were 
made extremely large, it follows that al must ap])roach zero as each of the St' 
approaches infinity. Furthermore, if Pi = P 2 = • ■ ■ Pn = P, we must have: 


^ 2 “ py - ^ O'" 

~ py- 

t^i 


( 4 ) 


* The formula as given in this paper is a modification of that given by Rietz, ILL. (1927) 
in his book. Mathematical Statistics, Open Court Publishing Co., Chicago, which was 
necessary in order to make it applicable to relative frequencies. 
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These conditions suggest that be defined by the equation: 


a 


2 


W V 1 

n ^ .s/ 


( 5 ) 


If <7 c is so defined, it will obviously approach zero as ca(;h of the approaches 
infinity. Furthermore, it has been shown by Yule- that if we have a series of n 
relative frequencies, such as the pt und(T discussion, based on n samples of 
unequal size, and th(^ probabilities of the occurrence' and non-occurrence, 
respe(;tively, of the' particular event under consideration are constant from 
sample to sample, the sepiared standard deviation of those n'lative frequeiK'ies 
is given by a relation such as that use'd to defim^ al in equation (5). There- 
fore, the second condition is also satisfied, al may be inter})reted as repre- 
senting that part of the squared standard deviation of th(' pt which is due to 
the unreliability of the pt as estimates of the Pt. 

Therefore, it seems reasonable to write: 


- S S S - • 

71 ^ n " 71 " St 


( 6 ) 


Combining ecjuations (1) and (6), we obtain the following formula for calcu- 
lating the squared standard error of : 


^2 _ P9 , Xk - I 

- s, ns* 


2 (p' - p)' - S ! 




Since the weight of a measurement is inversely proportional to the scpiare of 
the standard error of the measurement, we are now in a position to calculate a 
weighted mean, p, of the pt. 


p = -~ 


^‘Pt 


in which : 


Wt = 


2 Wt 
1 


( 8 ) 


( 9 ) 


The squared standard error of p is given by the familiar formula: 


-2 __ f = l 
== 


Wtipt - pY 


( 10 ) 


(n — 1) 53 


* Yule, G. Udny, 1927. Introduction to the Theory of Statistics, Charles Griffin and 
Co., London. 



PROBLEM INVOLVING LEXIS THEORY OF DISPERSION 


81 


It would seem that p may be accepted as a good estimate of th(^ average 
hatchability of the fertile eggs produced by the given lot of birds, and that 
equation (10) may be used to obtain a valid estimate of the reliability of p. 

However, the problem is not quite so simple. In the first place, there is 
usually a vsmall amount of positive correlation between the number of fertile 
eggs produ(;ed by a bird and the hatchability of those ('ggs. Se(*ondly, as 
pointed out earlier in this paper, the hatchability of fertile eggs is influenced to 
some extent by the male birds used to fertilize the eggs. The error involved in 
neglecting the correlation b(‘twe(m hat(*hability and number of fertile eggs incu- 
bated does not seem to be of much importance in those practical problems which 
have come to the author^s attention. The eff('cts of differences among the male 
birds may be largely obviat(‘d in experimental work by fn^quently transferring 
male birds from lot to lot during the experimental period. 

The best test of the suitability of a particular formula for calculating the 
standard (Tror of an average is to compare the value of the standard error 
calculated by nutans of the formula with the corresponding value obtained by 
direct calculation from the distribution of a number of such averages obtained 
under essentially the same conditions. The accompanying table gives the 
standard error of th(' weighted average hatchability of fertile eggs calculated 
for ea(;h of four lots of birds by means of equation (10), together with the corre- 
sponding values obtained from the distribution of averages. The former arc 
designated as ihv “pnnlicded’^ values and the latter are designated as the 
^‘observe'd^^ values. In the calculation of the ^^)bscrvcd^^ values, the various 
averages were assigm^d the same weights whicdi w(Te used in the calculation 
of the ^^prcdictcd^^ values. 


Comparison of ^^predicted” and '^observed^^ standard errors of the weighted average 
hatchability of fertile eggs^ calculated for each of four lots of birds 


Lot 

~v 

Standard error of p 

“Predicted" 

“Observed" 

1 

0.7684 

0 0287 

0 0327 

2 

0.7115 

0 0533 

0.0561 

3 

0.6834 

0.0355 

0 0379 

4 

0.7260 

0.0615 

0 0674 


The data used in these calculations involved a total of 74 birds, approxi- 
mately equally divided among the four lots, and a total of 2,901 fertile eggs 
which were produced and incubated during an experimental period of 48 weeks. 
The agreement between the “predicted’^ and ‘^observed^^ standard errors of the 
weighted average hatchability for each lot of birds is excellent. However, the 
author^s experience with biological data tends to make him doubt that such 
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close agreciiKnit will always be found when such data an' subjected to the 
above treatment. The agreement in the pres('nt illustration could be less 
close without indicating that the method of calculating the ‘^predicted^^ stand- 
ard errors is unsound. 

Bureau of Animal Industry, 

U. S. Department of Agriculture, 

Washington, D. C. 



A METHOD FOR DETERMINING THE COEFFICIENTS OF A 
CHARACTERISTIC EQUATION 

Paul IIokst 


For tho charaett'ristic ('(iiuilion 

(In — . . . (In, 

= (-l)"Cr" - + r 2 .r“ - • • • + c,,) 

^ (j* — «i)Cr — ofo) • • • U — a,,) 

it is well known that 

Ct = .1 t 

when' A » is th(' sum of all onh'r eo-axial minors of the determinant 


an • • • airt 

( 2 ) 

a n\ • • * a/rifi 

If n exec'eds 3 or 4, the procavss of ealeulating all possible' prinenpal minors is 
very cumbersome. 

But another more' syste'inatie method of calculating the 6*\s may be adopted. 
Suppose we define 


and 




• a 


ip) 


t 


Jp) 


= S 


p • 


It may be proved' that 


s„ = X «V’.’ • 


(3) 

(4) 

( 5 ) 


But from Newton\s identities- w(' have 

Sp -f- C\Sp..\ -f- 2 “F ’ ’ * 4" 1*^1 “h = 0 . (6) 

1 Muir, L. & Mctzler, W. H., “A Treatise on the Theory of Determinants,” p. 606, 
If 650 and 651. 

2 Dickson, L. K., “First Course in the Theory of Equations,” p. 134, If 106. 

83 



84 


PAUL HORST 


Newton’s idcnitities are ordinarily (‘inployed for calculating the sums of the 
powers of th(‘ roots of a polynomial when the coefficients are known. They may 
be emi)loyed equally w(41, however, for calculating the coefficients when the 
sums of the powers are given. Thus by nutans of ('(luations (5) and (6) the 
coefficients of (1) may be readily calculated. 

If in (2) a,y = a/„ the calculation of the successive values is straight- 
forward. The determinant A is used as a constant multiplier so that 

A-A = A'\ = A'* 

and the multiplication is column by column. That is, 


a 


(It;.) 

» J 


n 


= S “a. a 

A: = l 


ip) 

k. j • 



THE GENERALIZED PROBLEM OF CORRECT MATCHINGS 

By Dwight \V. C'iiapman 

A method common to many experimental and testing procedures in psychology 
and education is to require an individual to match, as best he can, memlx'rs of 
one scries of items with members of a second series of quite diffennit items certain 
of which are in some sense true apposites of items in tlu* first series. Thus the 
experimental psychology of personality has often investigated the* ability of 
graphologists or laymen to pair samples of handwriting produced by a group of 
persons with, say, character-sketches of these same persons; and the excess of 
correct matchings thus produced over the number to be expected by chancti has 
been used as evidence that the expressiv'e movement of handwriting affords 
charact(Tistics diagnostic of personal traits. Fortunately, the ('xcesscs experi- 
mentally obtained have often b(»en so large as obviously to (‘xclude the operation 
of chanc(‘ aloiu'. But many empirical results show small excessc's only; and the 
interpretation of such findings has not hitherto been subjc'ctcnl to rigid statistical 
analysis. 

The particular statistical probkan nvsident in this experimental procedure is 
twofold, involving the estimation of the significance of (a) a givcai number of cor- 
rect matchings produced by one individual, and (b) a givtai mean number of cor- 
rect mat(;hings produced by a group of individuals working with the same mate- 
rial ind(}p(‘ndcntly. 

Furthermon', two cases arise in practice: (1) the two series of items arc of 
equal kaigtli, and each ihan in (*ith(‘r series has a true apposite in the other scries; 
or (2) tlie two s(a’i(*s may b(* of unequal kaigth, in which case the longer series 
contains not only a tru(‘ a])posite for each item of the shorter series, but, in 
addition, > certain number of extra, irrelevant items which cannot be correctly 
matched with any items in the shorter series. I have already givcni the solution 
to problems (a) and (b) for case (1).^ But case (1) forms only a corollary of the 
more g(Mieral cas(‘ (2), to the solution of which this present paper is devoted. 

(a) The Significance of a Given Number of Correct Matchings Resulting 

from a Single Trial 

Let there be given a series of u x-items, 

Xly 0 * 2 , • • • 0 * 4 , • • • Xn 

and a scries of t y-i terns, 

2 / 1 , 2 / 2 , •• • Vt . 

' The Statistics of the Method of Correct Matchings, Amcr. Jour. Psychol., 46, 1934, 
287-298. 
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Lot t g Uj and lot tlio first t x-itoins bo in some sonso tru(' a])i)ositos of the oorro- 
spondingly niimlM'rod /y-ib'ms, so that if bo pairod with Xj (j ^ 1, 2, • • • /), 
this pairing will oonstitiito a oorr(‘ot matching. 

The first probl(‘m which arises is that of determining the probability tliat a 
singhi random arrangcMni'iit of the t /y-itcans against t of tlio T-itoms will result in 
exactly 5 ( = 0 , 1, 2 , • • • t) corn'ct matcliings. 

We b('gin by putting tlu' first .s /y-itoms in coiT(‘S|)ond(mc(^ with tlaar apposite^ 
j-itoms. Then the numlxT of arrangements of the t //-items in which only tliose 
.sare correctly matched is the number of arrange nu aits of the remaining t — s //- 
itcmis against tin* nanaining u — .s* T-itians such that no correct matchings occur. 
With respect to tlu'se items, l(‘t 

n ~ tlie number of all i)ossibl(‘ arrangiamaits, 

7i{Yj) ~ th(‘ numb(‘r of arrangements such that at haist tlu' itcan is cor- 
rectly matched with its ajiposiU', 

— tiu' number of arrangeiiKaits such that at fiaist both the and 4:^’* 
items ar(‘ match(‘d with tlndr apposites, etc.; 

and let 

7i{Y j) = the number of arrangcaniaits su(*h that at haist tlu', item is not 
matched with its apposite, 

n{YjYk) = th(‘ number of arrangements such that at haist the/’* and fc”* items 
ar(' not matchc'd with tluar apposites, etc. 

We have then to evaluate the expression • • • Tf), the number of ar- 

rangements of the items remaining, after setting .s of them coiT(*ctly matclnal, 
such that no further correct matchings oc(*ur. 

Now it can be shown that‘s 

U(Y t^\)[ a-i-2 ^ t) = 71 

““ '1-0 + ••• “h '^.(FOl 

+ 2 ) + j,fiFa43) + • • • + 7l{} f-il 0] 

- [^^(F,4i}V2F,4:0 + ••• + n{Yt-2Yt-iYt)] 

+ ••• 

+ (-l/n(F,4iF.42-- - FO. 

The \'aluc‘ of the expn'ssions on the right vsidc of this (Hpiation can be deter- 
mined as follows: 

2 II. Whitney, A Logical Expansion in Mathematics, Bull. Amer. Math. Soc.j 1932, 572- 
579. 
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The value of n is the number of ways in wliicli t — .s* items eaii be arranged 
against 


u — a items, which is r — — 

[(u s) - (t ~ .s)]! 


(?/ - n)! 
(u - /) ! * 


Th(» value of th(' first bracket — the number of arrangements of thc'sc; items 
such that some one* of th(*ni is correctly matched — is derived by holding oik* of 
the items matched, which can be* che)se*n in t — s ways. This leaves ^ — .s — 1 
/y-items, which can be arrangeel against the remaining ?^ _ 6‘ — 1 a-ite*ms in 
(r — ,s — 1 )!/(r — t) \ ways. The pre)eluct e)f the*se twe) exi)re*ssie)ns give*s us fe)r 
thei value ejf the first bracket 


[n( 7,^.0 + . . . + n(Fd] = 


(t _ .s.)!(// - .*>• - 1)! 
- 0! 


To evaluate the se*ce)nel brae*ket, we hold twe) e)f the t — s items mate*heel, which 
can be clmsen in (i — .s‘)!/l2!(^ — .s — 2)!] ways. The‘re* n*mains t — s — 2 
?/-itenis which e*an be arrange*d against the remaining u — s — 2 j’-items in 
(u — s — 2)\/{u~t) \ ways. The ])re)ehict ejf the*se* two expre‘ssie)ns give*s us 




it - .s*)!(// - .s - 2)! 

2fa - .s ~ 0’!‘ 


(k)ntinuing thus, we eleveloj) the following series fe)r the number e)f arrange*ments 
of t items against u ite*ms such that the first .v are* correctly mate;lied: 


ndVifVs • • • Fe) = 


(fi - >s0! 
(r - /)! 


(I 


s)!(u — 6- — 1)! (t — i{)!(u — s — 2)! 
■ + 2!(/ - 




( t - .s‘)!(r - Q ! 
{t — s)\{2i — i)Y 


In order to e*xpress the number e^f arrange*me*Jits, N(^s), «uch that any s correct 
matchings occur, we* must multii)ly the above vseries by t\/ls\(t — s)!], which is 
the number of ways in which items can be chosen from t items: 


^ t\ ["(r - s)! 
s\{t - s) \ l{u - i)\ 


(t--s)\{u s - 1)! 


{u - t)\ 

+ • 


+ (-1V 


{t — «)!(r — 
{t — s)\{u 


-tyr 


And in order te) obtain the probability that a single random arrangement will 
result in exactly s correct matchings, we must further divide by ii\/{u — <!), 
which is the total number of ways in whie*h t items can be arranged against 
u items. Calling this probability P(«), we have then 


Pis, 


t\ {ii — 0! r — ^)! 
?^!s! {t — s)! — 0^ 


(^ — s)! {%i — s — 1)! 
{u-t)\ 

+ ...+(- 1)^- 


{t - 

{t — s) I (u 


t)} 
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Finally, factoring {I — s)\/ (u — t) ! out of all terms in the bracket, the series sim- 
plifies to’ 


P(.) 


j!_ r 

slid Lo!(< - s)! 


(a 

1!« 


_.s-_l)! (w_s_2)! 

- s - 1)’: 2l{l - s - 2)! 


+ (- 1 )'-* 


(!/ - 0! 1 

(t - s)!0!_ ■ 


( 1 ) 


In any practical situation, the significant question is not the probability that 
exactly s correct matchings shall 0 (!cur, but the probability of » or mure correct 
matchings. Obviously 


(* or more) 


= Pin) + Pis M) + • • • + Pu) • 


whence, by equation (1), 


P 


is or more) 


d_ r {n -j)l 

sh<! _0!(< — s)! 


l!(l -s - 1)'! 2!(/ - .s- - 2)! 


<!_ r ^ 

(s +' l)!tt! [()!(< - S - 1)1 


t! r Qt - s - 2) ! 

(s + 2 )!m1 Lo!(< - 2)! 


(u _ _ 2) ! 


+ (- 1)'- • 


0^0! ■ 
(t - s)!'ol_ 


l!(f - s - 2)1 

+ ...+(_ l)r 


(u - t)l 
{t-s- 1)!0! 


i.,] 


• •• + (- 1)'-*-’ 


{t - s - 2 )! 0 !. 


+ ... 

4_ Ai r ~ Ain 

■^<!m!L 0 ! 0 ! J’ 


( 2 ) 


Or, collecting terms in a form better suited to practical computation from tables 
of factorials and n^ciprocals, 


(« or more) 


= A! “'AArj-l 

u\ \ (< — s) ! Lois' J 

^ ~ ^ z_LA: r - 


1 ] 

IslJ 


(/ _ S - 1)' L0!(s + 1)1 11s! 

(i< - s - 2)1 r 1 

- s - 2)1 Lo!(s + i 


1 


rt + 


0!(s + 2)! l!(s + 1)1 ^ 2!s! 


llslj 


3 In the special case in which the series of x-items and the series of ?/-items are of the same 
length, whence t = equation (1) reduces to 

!>,,= An _ A + A A 1 

slLo' 1! 2! ' (J-s)!j 
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+ 


0/ - Olf 1 1 1 

^61 Lo!<! !!(< - 1)! 2!(< -2)! 


+ (- 1 )' 




-Wi]} • 


(b) The Signihcance of a Given Mean Number of Correct Matchings Result- 
ing from ti Independent Trials 

A in'qiicnt practical situation is lliat in whi(*h interest eent(n’s on tlu' signifi- 
cance of the moan number of (correct matchings achi(‘V(‘(l by a group of n indi- 
viduals working independently with the same two s('ri('s. 

Jn order to det(‘rniine the probability that the mean numb('r of correct match- 
ings, s, resulting from n independent trials shall (‘(|ual or excec'd a given value, w(i 
are required to describe the distribution of the means of sam])l(‘s of size u drawn 
at random from a parent population in which th(‘ variable^ is .s(= 0, 1, 2, • • • 0 
with relative frequencies P(()), P(i), P( 2 ), • • • P(o, given by equation (1). The 
tabulation of this parent distribution follows: 

Table I: Distribution of s 

s Relative frequency {= P(s)) 


0 

1 

2 

t 


tl r ul (u - 1 )! 0 / - 2 )! (u - 3 )! 

1!(^ - 1)! + 2!(^ -'2)! 3!(^ - 3)~! 


t\ r {u - 1)! {u - 2)1 {u 

- 1)! m - 2)!"^2!( 

tl r (a -2)! (a -3)! 
2!a!Lo!(^ ~'2)1 ll{t - 3)! ‘ 


tl r {u — t)i 
ihP.L 010! .. 


3)! 

:“3)“! 


+ 1)^ 


(a - t)l 

roi 





(a - t)l 
(t - 1)'!0! 


] 

] 


+ (- 1)'-'* 


(u - t)\ 

(t - 2)!0!. 


We now determine the first four moments, vi, Vi, vs, and vt, of this distribution 
about the origin s = 0. Since, in general, 

t t 

Vk = ^ [s* X (Relative frequency of ,s)] = ^ s* P(,) , 

s = 0 a -= 0 

the tabulation for the computation of any moment is as follows: 
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Table II: The Computation of the Moment of the Distribution of s 


l^r _ U 

i Im! Lola - 1)1 1! 

2H\ r {u - 2)! (u - 3)! 
2 !m!Lo!(<- 2)! fia - 3)! 

34! r (m - 3)1 
SiwlLoia - 3)! 


1)! l!a-2)!'''2!a-3)! 


+ ••• +(-!)' 


r_ iv-i <)! ] 

' ' a-l)!0!j 

zlOII 

■2)!0!j 


-I- ( — l)‘-3 ^ ^ 

^ ^ ^ it - 3)!0!j 


14! [(m - l)!l 

!!m!L 0!0! J‘ 


, P p-i 2*^ 2*^-1 P 

Noting that--: = = -TT-, • • • ., = 77 7 


- 0! ’2! “ 1! ’ l! ~ a - 1)! 

in brackets by these factors, we develop Table III: 

Table III 


, and multiplying the terms 


diagonal 2”^ diagonal diagonal 

0 0] i I 

t\ r - 1)! P“Km - 2)! P-i(m - 3)! 
^ ?<!Lo!0!a - 1)! 0!l!a - 2)! + 0!2!a - 3)! 


a r2M 

idLTW! 


<! r3*-‘(M - 
m!L2!0!(< - 


1“' diagonal 

I 


+ 

^ ^ ^ 0!(t - l)!0!j 


(t terms) 


P-‘(m-2)! 2*-i(m-3)! 

l!0!«-2)! l!l!(<-3)! 


+ ... + (_i)< 


+ (-l) 




2‘-'(» - 0!l „ , , . 

ncrripnj *' - ' 

i)T5i] " - ^ 


l! f — l)!l /, . 

‘ 7iL(r^T)l0!0iJ 
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Since each scries in brackets is one term shorter than the preceding series, the 
table forms a system of i diagonals. The sum which gives us V}, may then^fore 
be considered as the sum of these diagonals. 

Now, from inspection, it is evident that the general diagonal is of the form 


diagonal 


tl{u - «)![ (a - 1)'“' 

u\{t - s)\l(s ~ 1)!0! (s ~ ^V.l! 


+ ••• + (-1)’-^ 


1 

0!(r - l)!j 


tliu — s)! 
u\{t — s)! 


1 

(s - 1 )! 


r -^0 



But it can be shown'^ that 


W lienee 


2 (-l)^(.s - 

r = 0 



when k < s . 


6**’' diagonal = 0 when k < s , 


Therefore Vk is given simply by the sum of the first k diagonals of Table III. 
Or, in general, 

“ «!(< - l)!Lo!0!j 

t\(u - 2)!r2‘-i p-n 

+ m!(< - 2)lLi!0! OIllJ 

<!(m - 3)!r3*-‘ 2^-1 P-‘1 

m!(< - 3)!L2!0! 1!1! 0!2!j 


t!(«-fc)!r (fc - l)*-‘ 

+ m!(< - A;)! L(fc - 1)!0! (k - 2)!1! 

+ ... + (-1)*”'o!(a!- I!)]- 

To this equation we must, of course, add the condition k ^ t. 

' E. Netto, Lchrhuch der Comhinatorik^ Loipzig, 1901, 249, Formula 17, 
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Solving now lor th(‘ first four moments, we have 
t 


vi = 


V2 




'_rJ£n 

u - 2) J ’ 




t ' 


1 ^ I + (rri)( 

1 L 7 ^ 4- fi 1^2) a_- !)(/_- 

^ // - 1 (n - 1 ) 0 / ~ 2 ) {u - ■!)((; - 


If now W(‘ define, for e()nveniene(', 


2)Jf -^3) ■ 

2)0/. - 3L ‘ 


// - i ’ 

t - 2 


we have, for the constants of the distribution of /?, 

Mean = vi = a , 

112 = V2 — I'l 

= a(l + b) — a-, whence a = \/a(l + />) — • 

M3 = J's ~ 3j/iV2 + (5) 

= a(l -f 36 + 6c) — 3a2(l + 6) + 2a^ 

jU4 = j/4 — + GpIpo — 

= a(l + 76 + 66c + bed) — 4a“(l + 36 + be) + 6a®(l + 6) — 3u^ 

From thes(^ constants we can determine the skewness and kiirtosis of the distri- 
bution of «, 

/3i = , and ft = . (6) 

M2 

Now it is known that th(i means of samples of size n drawn from a parent 
population with (‘onstants jSi and fio are distributed in such a way that 
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^ICnuMtis) — ' 


/3i 




iiiftins) 


= 3 + 


- 3 


n 


( 7 ) 


Therefore, having determined the beta-eoiistants for the distribution of .s, we 
can determine the l)eta-constant.s of the distribution of 5, tb(^ mean number of 
correct matchings resulting from n independent trials. 

Now when / = ^ 4, we have 

a = h — c = (I — 1 , 

and equations (5) give us for the distribution of s 
Mean = 1 , 

= 1 , 

M3 = 1 > 

M4 == 4 . 

and therefore, for the constants of the distribution of 5, wc? have, by equations 

( 7 ), 

= — , and ^2 = 3 -|- 
n 


whence'. 


1:^2 = 4 


which indicates a ])ositively skc'wed and le))tokurtic distribution. The ('ffe'ct of 
increasing xi and holding t constant is to increase the skewness, as shown in the 
following table for ^ = 5: 


t 

u 


5 

5 

1 

n 

5 

G 

1.05 

n 

5 

7 

1.16 

n 

5 

8 

1.31 

n 

5 

9 

1.46 

n 


The degrees of skewness and kurtosis met with in practical cases of matching 
with any considerable number of judges (ji) are such that a Pearson Type III 
distribution curve gives a reasonably good fit to the distribution of mean num- 
bers of correct matchings. If, therefore, we have to determine the significance 
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of any obtainod moan number of correct matchings^ we may resort to Salvosa^s 
tables^ of the area under the Type* III curve. 

As a concrete example of the application of this method let us imagine that 10 
judges have arranged 5 character sketches against 8 specimens of handwriting, 
5 of which are true apposites of the sketches. Let the total number of correed 
matchings achieved by this group be 12, whence the mean number per judge is 
1.2. We have, then, 

s = 1.2, = 10 , 

/ = 5, ?/ = 8, whence a = ~ = .625 , 

^l 


c = ? = -WO . 

u — 2 

We now find the mean, standard deviation, ai\d (3i of the distribution of s, as 
follows: 

The mean of the distribution of s is, by sampling tlu'ory, the same as the mean 
of the distribution of 5. 


Mean = a = .625 . 


The second moment of the distribution of s is, by sampling theory, - times 

n 


the second moment of the distribution of a) whence, by (Tiuation (5), 


Standard deviation = \/ [a(l + 5) — a^] = .243 . 


And, by equations (5) and (7), 

1 [tt(l — 3a^(l d" 

- yq f^(r+57^2p - . . . 

Now the obtained mean number of correct matchings was 1.2, and the next 
lower number which (*ould have occurred (corresponding to a total of 1 1 instead 
of 12 for the group of judges) is 1.1. The lower boundary of the class-interval 
whose midpoint is s = 1.2 is therefore 1.15; and it is the area above this boundary 
under the curve of s in which we are interested. 

^ L. R. Salvosa, Tables of Pearson’s Type III function, Ann. Math. Statist., 1, 1930, 
191-198. 
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The deviation of this boundary from the mean of s is 


1.15 - .625 = .525 , 


and this deviation expressed in terms of the standard deviation gives 


.525 

.243 


2.16 . 


Entering Salvosa’s table for the deviation 2.16 and skewness = \/^i = .36, we 
find by interpolation that so good a performance should be expected by chance 
only about 23 times in 1000. 



MOMENTS ABOUT THE ARITHMETIC MEAN OF A BINOMIAL 
FREQUENCY DISTRIBUTION 

W. J. IviRKHAM, Oregon State College 

Although tho most useful moments of a binomial distribution have Ix'en 
derived as a function of the parameters of th(^ gcMK'raling binomial for any 
binomial fr(*quency vseries, a generalization of notation and procedure is well 
worth our consideration. Tlu* jjroblem attcanpted in this paper is the calcula- 
tion of the moments about tlu' mean for the genc'ral fnupiency series of Table I. 

TABLE I 

The generalized binomial frequency series 
X (item) / (frequency) 

N ‘ 

N^nC2pY'^ 


0 

1 




In calculating the moments of a set of data about any value*, it is often found 
convenient to use an arbitrary origin, define the^ niom(*nts about this value, and 
represent the desired moments in t(*rms of those calculated. In the general 
binomial series, the origin of the is found to be the best arbitrary origin. 
These intermediate moments are 




N 


= M, arithmetic mean; 


V2 


N ’ 


( 1 ) 


Vn = 


N 


where is the moment. 

The moments (/x^s) about the mean are easily defined as functions of the 
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from fundamental definitions of the Denoting the moment by we 
have 


Ml 

M2 

M3 


E/(^ - M) 

N 

Z/(x - Mr 
N 

j^fix - Mr 
N" 


= 0, 

=. v^- v\, 

= 1/3 — , 


(2) 


In general, 

fJLn — Vn — nCiVn^iVi + nCiVn-2vl + ’ * * + (—1)” — l)l'l • (3) 

Or, if we let {i^}" = Vny we may express the moment by a simple notation. 

= ImI” = — nCl[v]^~hi + “I'l + ••• = (jl/) — ViY^ (4) 

Solving the equation for {i/}, 

{*'} = {m} + n- 

Raising both sid(‘s to the power and substituting for the ^^Drace^^ notation, 

= Mn + nClMn-l^l + n02/Un-2l'l "b * * * + • 

Whence 


Mn == I'n — — r/Vn-2l'? , (5) 

a semi-recursion formula. 

The original formula for /x„ contained n moments or variables; and since there 
are only {n — 2) of the which are of lower order than /x„, it is necessary to 
retain Vn and v\ in (5). Since m = 0, one term in the expansion of jUn is zero. 
For instance, when n = 5, w(^ have 

= n — — 10 ^ 2^1 — A • 


To calculate /xa-, it is necessary to calculate the from vi to vk. For the 
binomial series, these I'^s are 


vi = + 


2(n)(n -1) p2gn-2 _j. — l)(n - 2) 


p3^n-3 


,n-lj 


Vi 


1-2 ' 1-2-3 

= np — p^q"-^ + ... -\-p” 

= npiq + p)"“^ = np , 

= wp ^1 ■ q”-'- + - p'g"-2 + 3 — — p^q"-^ + • • • + j , 
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V 3 = np ‘ + 2^— p>g”-2 4- ;p ^=^"-3 4- ... 4. , 

n = npj^l*^-i7’>-> 4- 2't-> — p'9’‘" 

, .,i_i (w - l)(w - 2) 

■''■ ■ 2! 

In the simplified form of )/*, the [ ] is the {k — 1)‘'' moment a))out —1 of the 
binomial series generated by tin; binomial {q 4- p)" '- Denoting this [ J by 
— 1), the ^^^s can be expressed by the formula 

vk = npv[^^(n - 1) , (6) 

where v' is a function of (n — 1) and {k — 1) while Vk was a function of n and k. 

Let us see how a v' in Vk can be defiiK'd in t(‘rms of the p^s of lower order 
than /c. In finding this relationshi]), a consideration of the two series of Table II 
will be helpful. 


p ^ qn -^ _|_ 


+ fr~^p 


k — 1 /|-^ n “I 




TABLE II 



f 

X 

f 

1 

TVn-iCoP^/"-' 

0 

Nn-lCop'^q”'^ 

2 

Nn-iCiP^q”-^ 

1 

Nr,-lCipW‘-- 

n 


71 — 

N„-lCn-lp’‘~'q'‘ 


The [ ] in pjfc for Table I is equal to the {k — 1)^’' monuait of x' about j*' = 0. 
Or 


Vk-ij Table II, x% = Vk-i, Table I, = — 1) . 


Also Vk-L for Xj Table II, is vk-i for the series generated by {q — 

The desired relationship between the p\s for the two series of Table II can 
be found by making use of the equations expressing the equality of the for 
X and x\ Dropping the variable which shows the number of items, the same 
for the two series of Table II, in the notation, we have 


g2 = = ^2 


2 ' '2 
^1=^2- V2 , 


2 pi(pi — v[) + {vi - v[f , 


Ms = M3 == — 3 p2*'1 + 2v\ =: Pg — SPg*'! + 2pp 

Pg = pg — 3 p2Vi + 2pJ + Sp^i'i — 2pj^ . 


Substituting the value of p 2 in the right member of Pg, 

— 3p2(pi — v[) + 3pi(pi — v[f — (pi — v[y. 
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Tn 5»;(Mioral, 

v[- = VL - //>A - l'[) -t- k('2n-2(l\ - +*•■+(- (") 

The formula just deri\ed may ho uschI to d('f‘m(' llu' inomonts about any 
origin ill t(‘rms of those about the original zero of tin' .r^s. For our imin(‘diat(^ 
use, th(* formula simplifies since p[ = == ri + 1. Then 

+ kCiPk-l + kC^Pk- 2 + AF;iI^A-3 + • • • (S) 

By simjile analysis we found th(‘ \alu(' of P] to b(' up. By the method of 
continuation, we are aide to extend the list of i^^s to any niimlKM*. p' from (S) 
is us(‘d in (fi) with n r(‘plac('d by (n — 1) in the r^s. 

Co — 1. 

Pi = np. 

V 2 = npp'i{n — 1) = np[pi(n — 1) + — 1)] 

= np[(n - l)/> + 1] = n(n - l)p‘’ + up. 

i/g = upP2{n — 1) = up\p2{fi — 1) + 2ri(/i — 1) + Po{u -- 1)] 

= u{n — \){u — 2)p^ + 3/? (a? — 1)//“ + np. 

= npp[{u — I) = np[p',i{)i — 1) + — 1) + — 1) + Po{n — 1)] 

= np{[(n - \){n - 2){n - 3)p» + 3 (/a - 1)(m - 2)/r + (n - 1)/;] 

+ 3 [(/? -- 1)(aa - 2)/r + {n - l)p] + 3 [{ti - l)p] +11. 

-- n{n - \){n — 2){n — 3)/>^ + i^(n){n — \){n — 2)p^ + 7{n){n — l)p“ + np . 


If the onhu’ of tlu' terms in the expansion is nwa'rsc'd, Pn is an ascending pow er 
series in p. The pure niuiK'rical co('ffici(*nts in some of these p^s are 

rx = (1) 

= ( 1 , 1 ) 

P3 = (1, 3, 1) 

^4 = (1, 7, 6, 1) 

^6 = (1, 15, 25, 10, 1) 

re = (1, 31, 90, 65, 15, 1) 

r7 = (1, 63, 301, 350, 140, 21, 1) 

1/8 = (1, 127, 966, 1701, 1050, 266, 28, 1). 
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In general, 

»'n+l = (l, £ nC., £ („C. ‘£ , 

^ ^ ^ ^ (9) 

Using the foregoing j/’s, and the scnh-recursion formula, we are abl(» to deter- 
mine the /x\s. 

^2 = V2 ~ v\ 

= [/ip + (n)C/i - l)p‘‘J ~ {rip)- 
= n/Hl - V) 

= n/x/. 

{JLs — V3 — 3 viP2 — J'? 

= [np + 3n(n — l)p“ + (n)(/i ~ 1)(// — 2)//] - 3(/?/>)[/?p(l - p)] — [npY. 
= np + ( — 3n)/P + (2n)p‘'^ = /^p(l — 3p + 2/P) 

= ^?p(l ~ />)(! ~ 2p) 

= np(j{q - /?). 

P 4 = [np + 7{n){n — l)p“ + {j{n){n — l)(n — 2)p^ + {n){n — l)(/i — 2) 

(/i - 3)p4j - 4(//p)(/i/;)(l - 3p + 2/P) - G(np)2(/ip)(l - p) - {nj?)^ 

= /ip + ( — 7n + 3//“)p“ + (r2a ~ G/P)/P + (— Gn + '^ti^)p^ 

= 7ip(l — Ip + 12p‘^ — iSp^) -h 3/i‘-/7“(f — *^P + P“) 

= npq — (ynp-q- + 'itiVT* 

/X 5 = np(l — 15p + 50/P — GO/P + 24p^) + 10/P/P(1 — 4/> + Gp*^ — 2p^) 

= {q — p){ripq — r2ap‘“(/‘“ + 10/P/PV/). 

MS = np(l - 31p + 180p2 - 390p3 + 3G0p4 ~ 120p«^) + 5AiV(5 - 3Gp 
+ 83p^ — 78/P + 2Gp^) + 15/P/P(1 — 3p + 3p“ — p^) 

= npq — ^0np-(f{q — p)“ + 2bn^p^(f‘ — 130/«‘^p®^^ + Ibn^phf, 

P 7 = np(l - G3p + G02p2 - 2100p» + 3360/P - 2520/P + 720p») 

+ n2/P(5G - G86p + 2590p2 - 4270p3 + 3234p^ - 924/P) + /P/P(105 

- 525p + 945p- - 735p3 + 210p4) 
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(7 ~ ~ OO/z/rv"’ — \{\2n~jP</^ -f 

///HI - 127// + 1032//’ - 10200//" + 27)200//' - 31920//'^ + ‘20100//*^ 

- 5040//“) + //’//-(1 10 - 2304// + 13X05//’ - 35700//' + 40004// 

- 20232//5 4 - 730S//«) + //-ydOO ~ 3X50// -f lOOOO//’ ~ 14770//3 

+ 0520//' - 23X0//5) + //'//‘(105 - 420// + 030//’ - 420//^ 105//') 

npq{l ~ 42//7(3 — 10 // 7 (l — 3//7))) + 7//‘“’//V(17 — 4//7(77 — 201 // 7 )) 
4- 70//^//"7^(7 — 34//7) -|- 105//*//'7'. 



ON CERTAIN DISTRIBUTION FUNCTIONS WHEN THE LAW OF THE 
UNIVERSE IS POISSON’S FIRST LAW OF ERROR* 

Bv Frank M. Wkida 

Introduction. The median, wlmih is that \'aliie of a p(4’miit('d variabh‘ 
which has as many observed values on one side of it as on the other, appears to 
be the natural competitor of the arithmetic mean when we an* int(U-ested in tlu^ 
probable or most probable value (»f an unknown cpiantity. It is well known- 
that the law of probability, namely, Poisson^s first law of (*iTor, which results 
from the assumption that th(» median is the most probabh* valiu‘ of tin* unknown 
ciuantity is given by 

h -Li! 

fix) =-<?'. (1) 

cr 

Little is known about the form of ih(» distribution functions of the more 
important statistics when the law of th(‘ “Universe^^ is Poisson^s first law of 
error. It, therefore, appears to be of interest and importance to (‘iilarge our 
present knowledge of distribution functions by finding certain new ones when 
the variable or \’ariables are defined by (1). 

In this paper we present the following results: (1) We have obtained an 
explicit expression for the distribution of nuarns of samples of 7i; (2) we ha\'c‘ 
obtained an explicit exp^ression for the distribution of differences; (3) we have 
obtained an explicit expn»ssion for the distribution of quotients; (4) we have 
obtained an explicit (expression for th(' distribution of standard deviations for 
samples of n; (5) we have obtained an (wjdicit expression for the distribution of 
geometric means for samples of n; (6) we hav(* obtained an explicit expression 
for the distribution of harmonic means for sample's of fi. 

In our analysis, we have made use of the theory of characteristic functions in 
the sense of Levy.^ This theory has beem extendc'd to more than oik' dimension 
by V. Ronianovsky* and by K. K. Haviland.^ S. Kullback,® in his thesis, has 
made further ext(»nsions and has applu'd them successfully to the* distribution 
problem in statistics. 

^ Presented to the American Mathematical Society, I'^ebruary 23, 1933. 

= Brunt, David: “The Combination of Observations,” 1923, p. 27. 

* Levy, P. : “Calcul des ProbabiIit6s;” pp. 15:1-191. 

* Romanovsky, V. : ^‘Sur un th6oreme limitc du caloul dcs probabiIit6s,” Recueil math^- 
matique de la Soci6t6 math^matique de Moscow, Vol. 36, 1926, pp. 36-64. 

‘ Haviland, E. K.: **0n the inversion formula for Fourier-Stieltjes transforms in more 
than one dimension,” American Journal of Mathematics, Vol. 57, 1935, pp. 94-101. 

* Kullback, S. : *'An application of characteristic functions to the distribution problem 

statistics,” Annals of Mathematical Statistics, Vol. V, No. 4, pp. 263-307. 

*jL1 - 
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The explicit expression for the distribution of arithmetic moans of samples of 
n is not new. This law of distribution has previously been obtained otherwise 
by F. HausdorflP and by A, T. Craig.® It is inserted here to show the superiority 
and greater power of our method when compared with previous methods and 
for the completeness of our discussion. The other results offered in this paper, 
as far as the writer knows, are new. 


1. The distribution of arithmetic means. Let us consider 


fix) 



(~a < X < a) , 


( 2 ) 


If we assume that Xij ^ 2 , • • • , Xn are independently distributed and that each 
Xx(i = 1, 2, • • • , n) is distributed according to the same distribution law, namely, 
Poisson’s first law of error, then it is fairly easy to see that the characteristic 
function for the law of distribution of means of samples of n is given by 


<t>(t) = 





If u = (i = 1, 2, • • • , n), then it follows that the distribution function 
of u, namely, P(u)y is given by 


which, upon simplification becomes 


PM = 


2"-' A*" f" dl 
"Ira” J-a U - aity ' 


(r>) 


It is readily seen that the poles of the integrand arc of the ?i"‘ order and art' 
those of (1 — (rit)". It follows by the well known Residue Theorem of Cau<‘hy“ 
that 


PM = 


” {n - l)\' M 


e-^ 1 

_(i + aity) 



a I 


( 0 ) 


If now, we replace w by ti | X’ | , we will obtain the desired law of tli(‘ distribu- 
tion of arithmetic means of samples of ii which is 


Pi\x\) 


- 1)! ’ dir^Xii + A 


(7) 


defined for all values of x on the range (~ a < x < a). 


^ Hausdorff, F. : Beitragc zur Wahrscheinlichkeitsrechnung Koniglich Sachsischen 
Gesellschaft der Wissenshaften zu Leipzig. Berichted iibcr die Verhandlungen Math - 
Phys. Classe, Vol. 53, 1901, pp. 152-178. 

8 Craig, A. T.: the distribution of certain statistics,^^ American Journal of Mathe- 

matics, Vol. 54, 1932, pp. 353-366. 

» Macrobert, T. M.: ‘Tunctions of a Complex Variable,*^ 1933, pp. 57, 295. 
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A. T. Craig® has given the distribution laws of arithmetic means of samples of 
size 2, 3, and 4. These results as well as the results for any n are readily ob- 
tained from (7). 


2. The distribution of differences. Let us assume that the laws of distribu- 
tion of X and y are independent and that they are given respectively by 

/W = — c *^1 ; f{y) = — c *^2; { — a < x < a) y {— a < y < a) . 

<T\ <T2 

In this case, the characteristic function of the law of distribution of differ- 
ences {x — y) is given by 

Ul 




/ c "1 / c "2 fly. 

J~a 0’2 J-a 


Performing the operations indicated in (8) and simplifying, w(' find that 

<b(t) = 1 

(T\<T2 (1 — (Tlit) (1 + (T2^t) * 

It is fairly easy to se(‘ that the distribution law of u is given by 

e-un 


p(») = 

ZTr<r\<r2 J-a 


(1 — (Tiity{\ 

Now, let l(l/ai) — 771 = c/u, then (10) becomes 


Piu) = 

jr)ori<r2(<ri + ct.^) /_ jl.. , „ 


(- »') 


e~'’ dv 

1 + 


\ (T\(T2 /) 


The integral in (13) is convergent because 

dv 

1 + 


Lim r" 


= 0. 


(- «') ] 

Hence, we find that 

P(«) . - r 

Trlcria^Kcri + G2) Ja 


\ <r\(T2 / 


e~'' dv 


{-V) 


1 + 


\ <riai J) 


( 8 ) 


(9) 


(10) 


( 11 ) 


( 12 ) 
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which upon simplification becomes 

4fcifc2 


P{u) 


W 


(Ti(T2{<Fl + <^ 2 ) ^'2 \ 0‘l0'2 


(Ti + 0-2 


m|, 


where W.. 1 u\ is the confluent hypergeometric function}^ 

"’ 2 ' I <r\(r2 J 

It is well known that 


(13) 


i (1 + J ^ 


for all values of k and m and for all values of z except negative real values. 
Clearly, 


w = re-‘ 

"■2 \ ffiffi J r(i) Jo 


dl 


which, upon simplification becomes 


W 


0, 


(Ti + <r2 


2 i, 




Hence, we now find that 

P{u) = 


I < ‘^9 


Akxk2 

' / 

+ <^2) 


(14) 


(15) 


If now, we replace a by | .r | — | // | , we will obtain the desired law of distri- 
builon of differences which is 


C(|a-|-|//|) = 


4A-,it2 


' |U| -|k|| 


<ri<r2(<ri + Oj) 


(lb) 


3. The distribution of ratios. We assume that the laws of distribution of 
X and u are independent and that they are given respectively by 


J. _ Li! I. _ ij!] 

J{x) = — e '1 ; /(«/) = — e ; {— a < x < a), (— a < y < a) . 

ffi 


Whittaker, E. T. and Watson, G. N.: “A course in modern Analysis,” 1915, pp. 333- 

334 . 
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I-et u = log I a: I — log 1 2/ 1 . The characteristic function of the law of distri- 
bution of quotients is then given by 

= — / e '1(1 x / e dy 

= 1^1^? f e ’^x'^dx f e ’^yr^*dy. 

JQ Jo 

Now, let s = x/(Ti and w = y/ai^ then clearly 

</>(/) = 4A-iA;2<ri Vj"'' ^ ds j dw , 

whence 


<l>(t) = 4fciVIV'r(iZ + l)r(l - it). (18) 

It follows that the distribution law of u is given by 

P{u) = / e-t<u+tioga^<-iiog<r2/p(ir^ ^ 1) Y(i _ ii) dt 

27r J —a 

which upon simplification, becomes 

P{u) f + 1) r(l - iO d<. (19) 

TT J-a 


Now, let (1 — it) = — y, then (19) becomes 

/>(„)== + (20) 

27rz 7-1 I a 

>Sinc(; it can be shown tliat’^ 

(l/27r/) f c-'’“r(2 + ") r(- w) (it; = r(2){l + (l/e“))-*, 

J-\-% a 


we find that (20) tecomes 

Pin) = r(2) { 1 + . (21) 

(T2 ( o'2e^*J 

Now, put = \x\/\y \ = R, whence from (21) we will obtain the desired law 
of distribution of quotients which is 


Pirn = 


Akxk^iT(2) 

0*2/2 



( 22 ) 


Macrobert, T. M., “Functions of a Complex Variable, “ 1933, pp. 114, 139, 151. 
Whittaker, E. T. and Watson, G. N., “A course in modern Analysis, “ 1915, pp. 283. 
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4. The distribution of variances and standard deviations. If we assume 
that the variance and standard deviation are calculated about a sample mean 

n — 1 

and if we let u = a;*, and if the Xx are independently distributed and each 

*“i 

Xi is distributed according to the same distribution law, namely, Poisson^s first 
law of error, then it is clear that the characteristic function for the law of 
distribution of variances of samples of n is 


4>^t) 


Let / represent the integral in the right-hand memb(‘r of (23). We obtain 

_ ^ 

that (dl/do) = I/a^f whence I = Ce Making use of the conditions: 

^ -r. VtT 


f: 


dx 


VI’ 


<T a, Ce ^ C, whence we find that 


/: 


dx 




Clearly, it follows that 


<#•«) = 


( »i 1 ) r » » _ I 

)n — lr.n — 1- 4 


n--J. 

2~ 


2 _ " ' ’ 
— e ^ 


n — i 4. 

a t 

We now find that the distribution law of u is given by 

(n — l)iri ?i— 1 n I 

rr ^ e 


P(u) = 


4 _ 2 


2jr<r'‘ 




dl. 


(24) 


(25) 


lOvaluating the integral in (25) with a suitably chosen contour,'-' W(‘ find that 

P(m) = ® • (26) 


2„-. r(5-5-!) 


Now, let u — xl = ns^, whence from (26) wo will obtain the desired law 

t *=1 

of distribution of variances which is 


P(s*) = 


n — 1 _ n — 1 

On-ljLn-1 2 ^ ILll, 

I H r e ^ , ^-. 2 _ 




( 27 ) 


** Macrobert, T. M., ‘‘Functions of a Complex Variable,” 1933, p. 07. 
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The law of distributions of standard deviations can be obtained at once from 
(27) since d(s^) = 28 ds. 

We shall now give the specific laws of distribution of variances for samples of 
size 1, 2, 3, 4, and 5 when the law of the '‘Universe^' is Poisson's first law of error. 
From (27), 

For n = 1, 


P{s^) = 0, 


Forn = 2, 


P(s^) 


1 - l -,2 
2^ke 


For n = 3, 


P(8^) 


-.,2 

4fcVc 


(0 < < oo). ^28) 


(0 < < 00 ) . 


(29) 


(0 < < oo). (30) 


For n = 4, 


Pis^) = 


32A:Ve 'sc-*’ 
<r»' 


For n = 5, 


p(»^) = 




(0 < s* < oo). (31) 


(0 < s’ < 00 ) . (32) 


5. The distribution of geometric means. As before, wo assume tiiat the 
j, are independently distributed and each Xt is distributed according to the same 
distribution law, namely, Poisson’s first law of error. Then, clearly, the charac- 
teristic function for the law of distribution of geometric means of samples of n is 


m) = 



Now, put s = x/o, then (33) becomes 


m = 



2”k’'(T’"‘[Viit -1-1)1". 


It follows at once that the distribution law of u is 


P{u) == 



g-.(u+«log.)ljp(ji 4. Djndt. 


(33) 


(34) 


(35) 
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Now, let if + 1 = —V, then (35) becomes 

/ -l+ia 

gv(«+nlog<r)|p(_^) ^ 

1— ta 




P(u) = — - — 


2Ti 

It is well known that (10) 


|r(-„)|. . 

sin" Trv{T(v + 1))" 


Using (37) in (36), we readily find that 

g»’(«+nloK<r)( 

nx 


On l^n 

P(u) = _____ e«+nlu(ti 


2n 


'(v + 1) j’'sin’‘jrt' 


civ. 


(36) 


(37) 


(38) 


It is fairly easy to sec that the poles of the integrand in (38) are the poles of 
jr(— *')!’' and that these poles are of the w**' ord(‘r. Ap{)lying the well known 
Ilesidue Theorem of Cauchy (8), we find that 


P(m) = 2'‘/!'"e“+"'“*‘' 


2 (_l)n+no+l p gt.(u+«loK<r) I'l^ 

_ (n-1)! 1*"-* Lll^T)l'‘J/.-a- 

Now, since u = log | j:*i 1 + lug | ^ 2 1 + • • • + log | |, then clearly, the dis- 

tribution law of the geometric mean, (#, is obtained from the law of distribution 
for u by means of the transformation 

u = log (fj)". 

Hence, from (39), we find the desired law of distribution of geometric means 
of samples of ?i which is 

r(«) W‘-'Llr(«'+ 1)1 


6. The distribution of harmonic means. I.et us assume that f(x) is the 
law of distribution for':r. It is well known^*^ that the law of distribution of 
x' = 1/x is given by 

Fix') = (l/a:'^)/(l/x') 

if 1/x is continuous on the range of definition of fix). Now, in case fix) is 
Poisson^s first law of error, we find that 

h -IaJ 

Fix') = Fil/x) = ; (-« ^ x < 0), (0 <:r ^ a). (41) 

a 

13 Dodd, E. L., “The frequency law of a function of one variable,” Bulletin of the Amer- 
ican Mathematical Society, Vol. 31, 1925, p. 28; “The frequency law of a function of vari- 
ables with given frequency laws,” Annals of Mathematics, Second Series, Vol. 27, 1925-20, 
p. 18. 
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We assume that the x[ are independently distributed and each x\ is dis- 
distributed according to the same law of distribution, whence we find that the 
characteristic function for the law of distribution of harmonic means of samples 
of n is 


<^(0 


-{i: 






from which, after simplification, we find that 


«(0 = 


(1 — ’ 

We now find that the law of distribution for u is 


. 2"A;«<r2« rot 


which, after evaluation and simplification, becomes 


(42) 


(43) 


P(u) = 


2“*“ 

^»r(3n) 


u 


(44) 


Recalling that in this case, u = l/( | + 1/| 0 : 2 1 + • * • + 1/| Xy^ |, we make 

the transformation u = n/lly where H is the harmonic mean; whence, from 
(44), we find that the desinal law of distribution of harmonic means of samples 
of n is given by 


ivn = 


2 


• rp-^»e 


n 

777 ^ 


(45) 


7. Conclusions. We liave shown that the same analysis is applicable to find 
the explicit expression for all the distribution laws we have discussed in this 
paper. 

The George Washington University, 

Washington, D. C. 



ON THE PROBLEM OF CONFIDENCE INTERVALS 

By J. Neyman 

When discussing iny paper read before the Royal Statistical Society on 19th 
June, 1934, Professor Fisher said that the extension of his work conceniing the 
fiducial argument to the case of discontinuous distributions, as presented in 
my paper, has been reached at a great exi)ense: that instead of exact probability 
statements we get only statements in the form of inequalities. 

This remark raises the (piestion whctlu'r the disadvantage of the solution 
which he mentioned (the inequalities instead of equalities) results from the un- 
satisfactory method of approach, or whether it is connected with the nature of 
the problem itself. 

I think that tlu^ problem is of considerable general interest. For instance it 
may be asked wh(‘ther the confidence intervals for the binomial distribution 
recently published by F. S. Pearson and C. .1. ( Jopper,^ w'hich corresix)nd to 
the probability statements in inequalities, could be bettered. 

The purpose of tin* present note is to show, (1) that in some exceptional cases 
the exact probability solution of the problem exists and that then it may easily 
be found by the method described in Note I of my paper (2) that in the general 
case of discontinuous distribution exact probability statements in the problem 
of confidence intervals arc impossible. 

In particular it will be seen that exact probability statements are impossible 
in the case of the binomial distribution and so that the system of confidence 
intervals published by Clopper and Pearson could not be bettered. 

In order to avoid any possible misundc'rstanding I shall start by restating 
the' problem. 

We shall consider’ a random discontinuous variate r, capable of having one 
or another of a finite, or at most denumerable se't of values 

^1, ^^2, • • • ^Tny (1) 

We shall assume that the frequency function, say p (x | 0), of x depemds upon one 
parameter 0, the value of whiedi is unknown. The problem of (;onfiden(;e in- 
tervals consists in ascri})ing to every possible value of x e.g. to Xny (n — 1,2,. • • ) 
a “confidence interval, say to 62(71) such that the probability, P, of our 
being correct in stating 

61(71) ^ 6 ^ 62(71) ( 2 ) 

whenever we observe x = Xn (n = 1, 2, « . •)> is cither: 

^E. S. Pearson and C. J. Clopper: The Use of Confidence or Fiducial Limits in the 
Case of the Binomial. Biometrika Vol. XXVI, pp. 404-413 

M. R. S. S. Vol. 97, p. 589. 


Ill 
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(a) equal to a given value a < 1 chosen in advance, or 

(b) at least equal to this value a. 

I proposed to call this chosen value a the confidence coefficient. 

In the earlier paper I showed that the solution of the problem in its form (b) 
is always possible and easy to find. If the variate x is continuous, then the 
solution of the problem (a) is equally easy. At present we shall consider whether 
and under what conditions the solution (a) is possible when the variate x is 
discontinuous. 

Suppose that the variate x is discontinuous as described above, and that the 
solution of the problem in its form (a) exists and is given by the system of 
confidence intervals Biixr,)) for n = I, 2, • • •. 

The position is illustrated in the diagram below. On the axis of abscissae 
the possible values of the variate x are marked. 'Ilie axis of ordinates is the 
axis of 6. The confidence intervals are marked on verticals passing through 
corresponding values of x. 


DIAGRAM REPRESENTING 

THE 

CONFIDENCE INTERVALS. 


• MARK3 A POINT 

BELONGING 




TO THE 

3ET or ACCEPTANCE x(e) 


e, (n) 



e.( 2 ) 

e.(3) 

e.(4) 

».(*) 

«. (n) 


e«(i) 


e.(4) 



L . 





e.(5> 

L 



«,<2) 

«,{3) 





e, <i) 






X, X 3 x^ x^ x^ 


According to our hypothesis the intervals {di{xn), B 2 (xn)) are so chosen that 

P (3) 

P is the probability of an event, say E, which we shall describe in wSome detail. 
Let us denote generally the probability of any event a by P{a). P{a \b] will 
denote the probability of an event, a, calculated under the assumption that 
another event, b, has already occurred. 

Now 

P = P{E] = the probability that {either (x = Xi) and then 0i(l) ^ B ^ ^ 2 ( 1 ) 
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or {x = X 2 ) and then 0i(2) ^ 0 ^ 02 ( 2 ) 


or (x = x„) “ “ diin) ^ e ^ diin) 


= P{x = ^ e ^ 82 ( 1 ) \ (x = xOl 

+ P{x = X 2 )P{ 0 i( 2 ) g 0 g 82 ( 2 ) I (x = xj)) 

+ 

00 

= ^ = ar„}P{0i(n) ^ 8 ^ 82(71) 1 (x = x„) } = a (4) 

n** 1 

The calculation of the probability P in the above form is not convenient, as 
both multipliers in each term of the sum in (4) depend upon the unknown 
probability function a priori of 0. Therefore we', shall present P in another 
form, giving to the event E a geometrical interpretation. Let us denote by 
CB the set of all (confidence intervals (0i(/O> ^ 2 (At)), as marked on the plane of 
X and 0. Thus CB will be composed of points with co-ordinates x and 0, where 


X = x„ 

71 = 1, 2, 

01 (n) ^ 0 ^ 02 (/i) 


(5) 


The set CB will be called the confidence belt. 

Denote by A any point of the plane of x and 0, having any values for its 
co-ordinates. 

It is easily seen that the event, which we denottc by Ej and the probability 
of which is P = consists in the point A belonging to the confidence belt CB. 
In fact the event E occurs if and only if the co-ordinates of A fulfil the condi- 
tions (5). But just these conditions define the points bekmging to CB. 

The above circumstance allows us to calculate P by means of a formula which 
discloses its connection with p(x | 0). 

Fix any possible value of 0 = 0' and draw the straight line LL the points 
of which have just this fixed value 0' for their ordinates. The line LL will cut 
some of the confidence intervals. Denote by A"(0') the set of points of inter- 
section, and by <t>{ 6 ) the unknown frequency function of 0. The set A"(0) will 
be called the set of acceptance corresponding to the specified value of 0. 

The function <^(0) may be continuous or not. So may be p(x | 0) considered 
as a function of 0. These cases may be treated together if we agree that F{d) 

9 

will denote either the sum or the integral of F{ 6 ) extending over all values of 0, 
whenever F{B) is integrable. 
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Using this notation wo may write 


P = P[E} 


e I x{B) 


( 6 ) 


where ^ denotes the summation over all values of x belonging to X{d), 
x(e) 

From the formula (6) may be deduced the following important proposition. 
The probability P may possess a constant value a, independent of the properties 
of the unknown function if and only if for each 6 


E (p(^ 1 ^)) = « • 

Xii) 


(7) 


The condition (7) is obviously sufficient to have P = a. In fa(;t, if it is satisfied, 
then we should get from (6) 

P = a = « (8) 

0 


since X) == ^ whatever the frequency distribution of 6. It is equally 

e 

easy to see that the condition (7) is necessary for having P = a whatever the 


function 0(^). For suppose that for 6 

— di we have 


E (p(^ 1 

X{0i) 

'll 

II 

(9) 

Then if it happens, that 



= 1 

for 0 = di 

(10) 

and 



II 

o 

for e 9i 

(11) 

the only term in the sum ^ which 

is different from Z(^ro 

will be that corre- 

spending to 6 = Oi and the formula (6) will reduce to 


p = E (p (^- 1 

x{ei) 

Oi)) = P 9^ a. 

(12) 


The original question, whether the solution of the form (a) is possible when 
the variate x is discontinuous is thus put in the following form: is it possible 
to define for every possible value of 0 a set of acceptance X(d) such that the 
equation (7) holds good? 

The answer is: in some cases it may be possible, but this depends upon the 
nature of the function p{x [ 0). It is very easy to invent functions p{x | B) for 
which the equation (7) for a definite value of a holds good, and we may even 
fix in advance the sets of acceptance Xifi), However the important question 
is not whether there may exist elaborately invented cases of discontinuous 
distributions where the solution (a) exists, but rather whether this solution 
exists always, or at least whether it exists frequently and in cases which are 
practically important. 
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This question must be answered in the negative on the basis of the following 
example concerning the most important of the discontinuous distributions, the 
Binomial. 

In fact it will be seen below that if x is a variate following the binomial fre- 
quency law, then whatever the arrangement of the sets of acceptance X(0), 
corresponding to different values of 0, the left hand side of the equation ( 7 ) 
cannot be constantly equal to the confidence coefficient a < 1. It will follow 
that in the case of the binomial distribution, the solution of the problem ( a ) 
is impossible. 

To prove this we shall consider the variate, x, following the binomial frequency 
law. That is to say we shall assume that x may have values 0, 1, 2, . . . ri; 
and that 

while 0 < 0 < 1. Since the set of possible values which x may have is finite, there- 
fore the set of all confidence intervals must be finite also. It follows that there 
is possible only a finite number of sets of acceptance X{d). Therefore there 
must be at least one set of acceptance, say X®, which will be common to an 
infinite number of values of 0, say ^ 2 , • • • ^n, • • • so that for each it will 
be X(^n) = 

Now 

I On)) (14) 

Af(9„) 

for all these values of B = On will be the same polynomial in 0 of the order n. 
If it has the same value a for a number of values of 0 exceeding n, it means that 
this polynomial is an absolute constant. Therefore if it were possible to give 
a solution of the type (a) in the case of the binomial distribution, it would be 
possible to construct a sum (14), the terms of which arc all different and have 
the form (13), and such that after all possible reductions and simplifications 
all terms involving B would caiic(4 and we should Ik* left only with one constant 
term a<l. This, however, is impossible, since the only t(*rm of the form (13) 
which involves a constant, is the term corresponding to x = 0 

p(0 I 0) = (1 - 0)" = 1 - ^-0'^ (15) 

and then this constant is 1. Other terms of the form (13) involve* B^ as a multi- 
plier. Therefore there exists only one sum of the form (14) whudi is an absolute 
constant, but this includes all the terms (13) 

^ (p(z \e)) = i (16) 

X - 0 

and thus is of no value. It follows that whatever the sets of acceptance X{B) 
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the corresix)ndmg 8um (14) will have values varying with the value of 6 and 
hence the solution of the type (a) in the case of the binomial does not exist. 

This, I think, gives the solution of the question raised by Professor Fisher. 
It is clear also that whenever the solution of the type (a) exists, it may be 
found by a suitable choice of sets of acceptance, and thus by the method ex- 
plained in my earlier paper. 

I should like now to raise another question. Past experience shows that the 
general problem of estimation may be formulated in different ways. The form 
of this problem as it appears in Bayes theorem, required for its solution the 
knowledge of the probabilities a priori. 

The form of the same problem treated by R. A. Fisher in his theory of esti- 
mation was solved in terms of a new conception, that of likeliliood. 

The problem of estimation in its form of confidence intervals stands entirely 
within the bounds of the theory of probability, without involving any concep- 
tion not already inherent in this theory. In the case of continuous distribution 
the problem also allows the solution (a) entirely independent of the probabilities 
a priori. Now it is shown that the necessity of the solution (b) is bound up 
with the nature of the problem if the distributions are discontinuous. 

My question is: is it possible to formulate the problem of estimation in a 
fourth form, leading to a solution which (1) stands entirely on the grounds of 
the classical theory of probability, and (2) is not depending upon the probabili- 
ties a priori — whatever the conditions of the problem? 



ANALYSIS OF VARIANCE CONSIDERED AS AN APPLICATION OF 
SIMPLE ERROR THEORY 

By Walter A. Hendricks 

The need for an elementary presentation of the methods of analysis of vari- 
ance has been recognized by many investigators in various fields of research. 
A recent monograph by Snedecor (1934) is undoubtedly the most comprehensive 
attempt to satisfy this need which has appeared in the literature relating to 
the subject. Snedccor’s treatment of the subject consists largely of the presen- 
tation of a number of standard types of problems to which the methods of 
analysis of variance arc applicable, directions for performing the necessary com- 
putations, and a dis(;ussion of the conclusions which may be drawn from the 
data on the basis of the analysis. 

In the opinion of the author of this paper, an elementary presentation of some 
of the theoretical considerations upon which the methods of analysis of variance 
are based would also be of some value. The methods of analysis of variance, 
as given by Fisher (1932), are presented as a natural consequence of intraclass 
correlation theory. However, the essential concepts may be presented in a 
more comprehensible form by the use of simple error theory. 

It seems appropriate to begin such a presentation with a definition of variance. 
If we have an infinite number of measurements of the same quantity, the 
variance of a single measurement is defined as the arithmetic mean of the 
squares of the errors of those measurements. In actual practice, an infinite 
number of measurements can never be obtained. We have instead a sample 
of n measurements, a*i, X 2 , • • • Xn, from which the variance of a single measure- 
ment may be estimated. By referring to any text on the method of least 
squares, it may be verified that the best estimate, of the variance of a single 
measurement which can be obtained from a sample of n measurements is given 
by the equation: 

n 

-S* = ■ — , y) - m)* (1) 

in which m represents the arithmetic mean of the n measurements. The 
quantity, n — 1, in the terminology of analysis of variance, is designated as 
the number of degrees of freedom available for estimating S^. 

It is often necessary to estimate from a number of different samples of 
measurements. In such cases, the best estimate of is obtained by calculating 
the weighted mean of the variances estimated from the individual samples, each 
variance being weighted by the number of degrees of freedom which were avail- 
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able for its estimation. The number of degrees of freedom upon which such an 
estimate of is based is given by the sum of these weights. Such an estimate 
of the variance of a single measurement is often designated as the variance 
'Vithin samples.'^ 

In one of the simpler applications of analysis of variance, a number of samples 
of measurements are available, and the investigator is required to determine 
whether the magnitude of the quantity measured varied from sample to sample 
or whether all of the measurements may be regarded as having been made upon 
a quantity of the same magnitude. 

An estimate, of the variance within samples may be obtained. Since 
is an estimate of the variance of a single measurement, the variance, S], of the 
arithmetic mean, of the measurements in any one sample is given by the 
equation : 




n^ 


( 2 ) 


in which n* represents the number of measurements in the sample. Let there 
be r samples. Then another estimate, of the variance of the mean, 
may be obtained from the observed distribution of the means, mi, m 2 , • * • mr, 
by the use of the formula for calculating the variance of a weighted observation 
as given in texts on the method of least squares: 


n^{r - 1) 


m)2 + n2{m2 — m)^ + * • • + ririmr — m)^] .... (3) 


in which: 


m 


Til mi + n2m2 + • • • + mmr 
ni + 7^2 + • • • +nr 


(4) 


Equations (2) and (3) yield two estimates of the variance of the mean, m,. 
It is apparent that these two estimates will be equal, within the limits of sam- 
pling fluctuations, if all of the measurements in the r samples were made upon 
a quantity of the same magnitude. If the magnitude of the quantity measured 
varied from sample to sample, S[^ will be greater than However, in actual 
practice, the two estimates of the variance of a partkailar mean are not com- 
pared directly. An equivalent comparison is made between two estimates of 
the variance of a single measurement. The first of these is nothing more than 
the variance within samples discussed earlier in this paper. The second esti- 
mate, which may be designated by is the value which would have to be 
substituted for in equation (2) in order to make S] equal to the value given 
for S[^ by equation (3). It is quite apparent that may be found by the 
use of the equation: 


= 


1 


r — 


1 


[ni(mi — my + n2(m2 — + . . . 4. nXmr — m)^] (6) 
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is often designated as the variance “between samples/^ A comparison of 
aS'^ with is obviously equivalent to a comparison of S[^ with Si, 

If is greater than a statistic, z, may be calculated: 

^ = 

This statistic serves as a useful comparison between and since its sampling 
distribution is known if all of the measurements comprising the data under 
investigation were made upon a quantity of the same magnitude. The distri- 
bution of Zj under these conditions, is given by an equation of the form: 

* 

in which Ui represents the number of degrees of freedom available for estimating 
>S'2, and 112 represents the number of degrees of freedom available for estimating 
S^, It is apparent from equation (5) that r — 1 degrees of freedom are avail- 
able for the estimation of S'^ in the particular problem under discussion. 

When any estimate of the variance of a single measurement is multiplied by 
the number of degrees of freedom available for making that estimate, th(j re- 
sulting product is known as a “sum of squares.” The additive property of 
the sums of squares and the degrees of freedom contributes much to the elegance 
of the scheme of analysis just presented and is of considerable prac^tical impor- 
tance in problems of a type to be discussed later in this paper. In the case 
of the problem discussed above, the additive property of the sums of squares 
provides that the sum of the “sum of squares between samples” and the “sum 
of squares within samples” is equal to the sum of the squares of the deviations 
of all of the measurements from their arithmetic mean. The additive property 
of the degrees of freedom provides that the sum of the “degrees of freedom 
between samples” and the “degrees of freedom within samples” is equal to the 
“total degrees of freedom” whicli is nothing more than the total number of 
measurements diminished by unity. 

The methods of analysis presented above may be applied to any study of the 
effects of a number of experimental treatments of the same kind upon the 
magnitude of a measurabki quantity. If experimental treatments of more 
than one kind are imposed simultaneously, the effects of each may be studied 
by modifications of those methods. The discussion of those modifications, 
about to be presented in this paper, is limited to data which may be classified 
in an “r X table, i.e., to studies of the effects of only two kinds of experi- 
mental treatments. More complex problems may be treated by simple ex- 
tensions of the methods presented. 

Consider an “r^ X s” table composed of rs cells, each of which contains a 
number of measurements of some quantity. The magnitude of the quantity 
measured may vary from cell to cell, but the essential conditions under which 
the measurements were made must be the same for all cells. It is also under- 
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stood that no cell may be empty. Table 1 is an example of such a table. The 
individual measurements have not been represented. Only the number of 
measurements, no, in each cell and the arithmetic mean, m,;, of those meas- 
urements have been indicated. The arguments, a»*, represent r experimental 
treatments of one kind, and the arguments, 6;, represent s experimental treat- 
ments of another kind. The problem to be solved is to ascertain whether or 
not the differences among the experimental treatments of each kind had any 
effect on the magnitude of the quantity measured. 


TABLE 1 

Example of an ‘V X s” Table Showing Only the Number of Measurements in 
Each Cell and the Arithjnetic Mean of Those Measurements 




a'2, 


az 


h. 

h 

h 

b4 

b. 

mil 

mu 

mu 

mu 


mu 

nil 

ni2 

Wl3 

ni4 


nu 

W21 

m22 

m23 

m2i 


m2. 

nn 

n22 

n2z 

n2\ 


ntn 

mzi 

mz2 

mzz 

mzi 


mz. 

nzi 

nz2 

rhz 

nzi 


nz. 


mri 

mr 2 

mu 

mri 


mra 

Url 

nr 2 

nrz 

Uri 


Ur, 


If each cell contains the same number of measurements, the effects of the 
experimental treatments indicated by the arguments, a^, may be studied by 
comparing the variance “between rows” with the variance “within cells.” The 
variance between rows may be calculated by regarding the r rows as r samples 
of measurements and applying an equation of the same form as equation (5). 
The variance within cells may be obtained by calculating the variance of a 
single measurement from the data in each cell separately and taking the mean 
of the resulting values. The effects of the experimental treatments indicated 
by the arguments, 6,, may be studied by comparing the variance “between 
columns” with the variance “within cells.” 

If the degrees of freedom between rows, between columns, and within cells 
are added, the sum will be less than the total number of degrees of freedom 
in the table. If the corresponding sums of squares are added, the sum is likely 
to be less than the total sum of squares. The differences are due to what is 
customarily designated as “interaction between rows and columns.” The 
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more descriptive term, “differential response, is sometimes used to designate 
the same factor. The nature ot this factor may be investigated by considering 
the effects of the experimental treatments, bjy in each row of Table 1. 

The data in each cell of Table 1 may be regarded as a sample of measure- 
ments. Therefore, the data in any row may be regarded as a set of s samples 
of measurements. By applying an equation of the same form as equation (5) 
to the data in any row, an estimate of the variance of a single measurement is 
obtained from the observed distribution of the means of the cells in that row. 
By calculating the arithmetic mean of the estimates for the r rows, an estimate 
of the variance of a single measurement is obtained from r{s — 1) degrees of 
freedom. This estimate may be designated as the variance “between cells in 
the same row.” 

The variance between cells in the same row measures the avc^rage effect of 
differences among the experimental treatments, 6,, in individual rows. The 
variance betw(»en columns, which was discussed earlier in this paper, is calcu- 
lated from s — 1 degrees of freedom and nu'asures the effect of differences 
among the treatments, 6^, on the assumption that the effect of any one treat- 
ment upon the magnitude of the quantity measured was constant for every row. 
The number of degrees of freedom assignable to differential response of the 
various rows to the treatments, 6,, is r(s — 1) — (s — 1) or (r — 1) (i? — 1). 
The sum of squares due to differential response is given by the difference be- 
tween the sum of squares between cells in the same row and the sum of squares 
between columns* These relations follow from the additive property of degrees 
of freedom and sums of squares. 

It may be observed that precisely the same results would be obtained by 
considering the effects of the treatments, a„ in the various columns of Table 1. 
The degrees of freedom and sum of squares due to differential response of the 
various columns to the treatments, a^y would be exactly equal to the correspond- 
ing values obtained for the differential response of the various rows to the 
treatments, bj. 

Up to this point the discussion has been concerned only with the special case 
in which each cell of Table 1 contains the same number of measurements. As 
a matter of fact, the methods given for the analysis of such data will yield 
correct results when applied to any “r X table in which the numbers of 
measurements in the cells in every row are proportional to the corresponding 
marginal totals for the columns, and the numbers of measurements in the cells 
in every column are proportional to the corresponding marginal totals for the 
rows. 

When the numbers of measurements in the various cells do not satisfy the 
above condition of proportionality, the distributions of the means of the rows 
and columns may be distorted, and, consequently, the methods of analysis 
described above may yield incorrect results. Efficient methods of analyzing 
such data have been presented by Yates (1933). A comprehensive discussion 
of these methods is considerably beyond the scope of this paper. One method. 
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described very briefly by Yates (1933) and designated as the ‘‘method of 
weighted squares of means/' appealed to the author as being particularly 
valuable for practical work. No detailed discussion of the method seems to 
be available in the literature. Therefore, the following presentation may be 
of some interest. 

Consider the experimental treatments represented by the arguments, a<, in 
Table 1. It is necessary to find an average value for the magnitude of the 
quantity measured for each row of Table 1. However, this average must be 
of such a type that its value will not be distorted by the unequal numbers of 
measurements in the various cells. The unweighted arithmetic mean of the 
means of the cells in the row seems to be the logical average to use since, within 
the limits of sampling fluctuations, the value of this average will be identical 
with the value which would have been obtained if each cell had contained the 
same number of measurements. The averages for the r rows arc: 


rria 


1 

s 


(mil + rnn+ • • • + mi,) 


rtia 


= - (m2i m22 -j- • • • -f" m2,) 


mar 


1 


{mri + mr2 + • • ' + mrs) . 


( 8 ) 


By the law of propagation of error, the variance of any one of these unweighted 
means is given by the equation: 


Si 


= I iSl. + su 


+ • • • + si) 


(9) 


in which is the variance of , and Slzj • • •> Si arc the variances of 
mxij mt 2 , * • • , misf respectively. If represents the variance of a single meas- 
urement, equation (9) may be written in the form : 


s?.. = (1 




+ 


±\S 

rixj fi- 


( 10 ) 


The value of may be estimated from the individual measurements in the 
various cells. is nothing more than the variance within cells, as customarily 
calculated, and may be estimated from the N — rs degrees of freedom within 
cells, in which N represents the total number of measurements in Table 1. 

The variance of a single measurement may also be estimated from the observed 
distribution of the means of the type, These means arc not of equal weight. 
Therefore, in order to find the variance of any one of them, it is first necessary 
to calculate the weighted mean of the r individual means. Since the weight of 
an arithmetic mean is inversely proportional to its variance, it is evident from 
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an inspection of equation (10) that the weight, pat, of a mean, ma^, may be 
found from the equation: 




= — + — + 
^*1 ^t2 


+ 


n^s 


The weighted mean, may then be found: 


( 11 ) 


Pai^ai -f- Poi^Oi "4“ • • • 4" Par'^ar 

Pa, + Pa, + • • • + par 


( 12 ) 


The variance of any mean, ma,, as estimated from the observed distribution 
of means of this type, is given by: 


A^« j = 


1 


Pa,(r - 1) 


— rtla)^ + pa,i>nat — maY + • • • 
+ Parima 


lUaY 


.(13) 


By substituting <Sof for and for .S*, in equation (10) and solving the 
resulting equation for an estimate, of the variance of a single measure- 
ment is obtained from the ol)served distribution of means of the type, It 
is evident that, after making the indicated substitutions, equation (10) reduces 
to the form : 


6“^ 

Si = [pa,(w«, - ma)^ + PaX'nia, — Wla)^ -f- ’ • ' + Parim^r “ ^a)'*] (14) 

r — i 


It is interesting to observe that, if the numbers of measurements in the re- 
spective cells were equal, equation (14) would reduce to the formula for calcu- 
lating the variance ^‘between rows’’ as customarily applied in analysis of 
variance. 

The two estimates, and variance of a single measurement may 

be compared in the usual manner by taking one-half of the natural logarithm 
of the ratio of the larger estimate to the smaller and making use of the tables 
of the values of “ 2 ” given by Fisher (1932). When using these tables, it is 
important to remember that aS^ was estimated from r — 1 degrees of freedom. 

The method of analysis just described may be employed to study the effects 
of differences among the experimental treatments indicated by the arguments, 
bjy on the magnitude of the quantity measured. The unweighted means for 
the s columns are: 

~ ^21 -|" • • • 4“ ^rl) 

r 


mb, == - (mi2 4- ^22 4- • • • 4 - ^^r2) 
r 


mb, = - {mu + mu + 
r 


4- mrs) 


(16) 
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The weight, 2 %^, of a mean of the type, may be found from the relation: 

- = - + - + •••+- (1 

^ 2 / 'i^rj 

A weighted mean, rntj may be calculated: 


nil, = + ‘ - + Pbsm , 

Ph, + Pb,+ --+Pb, 

An estimate. Sly of the variance of a single measurement may be obtained from 
the observed distribution of means of the type, m?,^, by the use of the equation: 


[p6.(^6, - muY + phX^rib, - nibY + • • • + Pbsim^s - rniY]- 


si may be compared with S^ in the usual manner. 

If it is necessary to study the ^‘interaction between rows and columns,” the 
effects of the experimental treatments, 6,, may be studied for each individual 
row of Table 1. Consider the distribution of the means of the cells in a row 
designated by the argument, a». The weight of any one of these means is 
equal to the number of measurements in the cell. A weighted mean, of 
the 5 means of cells in the row may be cakadated: 

__ n,my^ + nt2m,2 + - • + nr^nirs 

■ na + ^»2+ ••• +n„ .* ^ 

The variance, S[ly of the mean, for any cell in the given row, as estimated 
from the observed distribution of means of this type, may bo obtained from the 
equation : 

<Si“ = —J -- — iv - m',)2 + n,2(w,2 - + ■ ■■ 

+ n,s{mrs ~ m'/] (20) 

The variance, S ] , , of the same mean, as estimated from the distribution of the 
individual measurements in the cell, may be obtained from the equation: 

si, =- (21) 


By substituting ^ for S^jy and Sl^b for S^y in equation (21) and solving the 
resulting equation for Sl^b, an estimate, Sl^b, of the variance of a single meas- 
urement is obtained from the observed distribution of the means of the cells 
in the given row. After making the indicated substitutions, equation (21) 
reduces to the form: 

^ [n,i(ma — niaj^ + ?i.2(?w,2 — m«,)^ + • • • 

S X 

+ ( 22 ) 
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Such an estimate, of the variance of a single measurement ma}^ be 

obtained for each of the r rows in Table 1. By calculating the average, >5^6, 
of the variances of the type, an estimate, of the variance of a single 
measurement may be obtained from the r{s — 1) degrees of freedom between 
cells in the same row : 


■Sal. = + • • • 

' 1 = 1 

+ nrXmr, - mlf] 


.(23) 


Equation (23) is identical with the formula for calculating the variance between 
cells in the same row as ordinarily applied in analysis of variance. This result 
is a direct consequence of the fact that the unequal numbers of measurements 
in the various cells had no distorting effc'ct on the arithmetic means for indi- 
vidual cells. 

The presence or absence of interaction may be verified by comparing 
with In general, the actual variance due to interaction can not be obtained 
by the ‘Sveighted scpiares of means’^ method, for the; various sums of squares 
do not ix)ssess the additive property when the analysis is made in this way. 
However, the comparison suggested above will yield sufficient information for 
most pra(5tical purposes. 

For the special case in which r or s is ecjual to 2, the actual variance due to 
interaction may be calculated. Suppose r = 2 in Table 1. The following 
method, suggested by Yates (1933), yi(»lds an estimate of the variance due to 
interaction from a consideration of the differences, d,, betw(*en the means of 
the two cells in each column: 

di = mil — 'tn^i 


= ^1x2 — m22 


ds = mu — 


(24) 


The variance, of any difference, d;, is given by the equation: 



The weight, of the difference, d,, is given by the equation: 


2 

Vi 



riij 


(26) 


The variance of the difference, dj, as estimated from the observ^ed distribution 
of differences, is given by the equation: 

= — z -r lpi(di — dy + P2(d2 — dy + • • • + Paida — d)^] . . .(27) 

' P;(s->1) 
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in which: 

d = + P><d. 

/>i + P2 + * • • + 

By means of these relations, an estimate, ^f f'hc variance of a single measure- 
ment may be obtained from the observed distribution of the differences of the 
type, dj . This estimate represents the variance due to interaction and may be 
obtained from the equation: 

Si = [piWi ~ dy + P2W2 — dy + - . . + ps(ds — dy\ (29) 

It is quite apparent that 5—1 degrees of freedom are available for the esti- 
mation of the variance due to interaction in this particular example. 
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NOTE ON THE DISTRIBUTIONS OF THE STANDARD DEVIATIONS 
AND SECOND MOMENTS OF SAMPLES FROM A 
GRAM-CHARLIER POPULATION 

By G. A. Baker 


T. N. Thiele in his “Theory of Observations’^ makes the following statement 
with regard to the distributions of the higher halMnvariants in samples of 71: 
“Not even for 112 have I discovered the general law of errors. The purpose 
of this paper is to shed some light on the distribution of iU 2 and to give the distri- 
bution of second moments about a fixed point when the sampled population 
can be represented by a Gram-Charlier series. 

The distribution of the second moments about a fixed point of samples is 
given in complete generality. It is known that if the sampled population is 
normal there is a simple relation between the distribution of the standard 
deviations of samples of n and the distribution of the second moments of the 
samples about the mean of the population. It was thought that such a relation 
might exist in ease' the sampled population could be represented by a Gram- 
Charlier series. Such is not the case. Again, it was thought that by obtaining 
the distribution of the standard deviations for samples of 2, 3, 4, • • • it might 
be possible to deduce empirically a general law of distribution. This proved an 
unfruitful line of investigation but recpiired so much labor that the results 
should be report'd to save others time and en(*rgy. 

First, supposes that a population may be rei)resented as 


( 1 ) 

where 


/(x) = ao<^o(x) + a3<^3 (a*) + + • • • 


v?.(x) = 




Then applying Theorem II of the author’s paper on “Random Sampling from 
Non-Homogen eons Populations”^ we deduce at once the following theorem. 

Theorem I. The distribution of the second moments about the origin of 
(1) of samples of n drawn at random from a population represented by (1) is 
precisely the same as the distribution of the second moments about the same 
point of samples of n drawn from a population represented by the first term of 

_i_ 

(1), that is a normal population, and is proportional to x 2 e 2 ^ (loc. cit.) 


^ Thiele, T. N., “The Theory of Observations,^* reprinted in the Annals of Mathematical 
Statistics j Vol. 2, No., 2, May, 1931, p. 208. 

^Metronj Vol. 8, No. 3, Feb. 28, 1930. 
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This is not so surprising as it may seem at first if it is remembered that the 
odd subscript terms of a Gram-Charlier series slice off frequencies on one side 
of the mean of ao<^o(30 and add them onto the other side in the same manner. 
If we suppose that a population is given as 

(2) f{x) = ao^oix) + az<p^{x) + a^ip^ix) + . . . 


in the same manner wo get the following theorem. 

Theorem II. The distribution of the second moments measured from the 
origin of (2) of samples of 7 i drawn at random from (2) will be a combination 
of distributions of the type of Thoon^n I with only even subscrij)t terms con- 
tributing anything. The variations in the component distributions will consist 
of differences in the constant factors and the expoinait of x^ the estimate of the 

71 / __ 2 

second moment. The lowest (exponent will he - — . 

For instance, if 

(3) J{x) = no<^()(a*) + fl3V^3(*r) -b anp^ix) 


and n = 2, the estimates of the second moment will 1)0 distributed as pro* 
portional to 


c (do -f" 3)^ — -f- 3)x 4" (36a j 6aoa4 4- 18a4) 


SGaJ 4- 9a 

o 1 


' 4U 


Thus, it can be said that we know the distribution of tli(' second moments of 
samples about a fixed point if th(i sampled population is of the Gram-Charlier 
type in the sense that given the number of terms neca^ssary for an adequate 
representation and the number in the samj)les we can write down the desired 
distribution. However, this is not a simple matter. Further, if some relation 
existed between the distributions of the second moments about a fixed point 
and the standard deviations of the samples we would know the latter distribution 
also. Such a relation is not appanait for samples of 2 and 3. 

liCt us investigate the correlation surfaces of the means and standard devi- 
ations of samples of 2 and 3 drawn at random from a population represented 
by the first few t(*rms of a Grain-C'harlier series after the method of Dr. A. T. 
Craig.^ The distributions of the standard deviations can then be obtained 
immediately by integration. 

Suppose that 

(4) S{x) = aov?o(j^) 4 “ «3V53(x) 4- a^if^ix) 


^Annals of Malhcmaiical Slatislicsj Vol. 3, No. 2, May, 1932, pp. 126-140. 
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and that we are considering samples of 2. The probability of the concurrence 
of Xi and X 2 is 


(5) 

/(Ti)/(X2) 

and 


(6) 

Xi = — S + X 


where s is the standard dciviation and x is the mean of a sample of 2. By means 
of (6), (5) becomes 

( 7 ) + aoa^( — ^s-x — 2 x^ + Qx) 

+ ao(u(2s^ + 12,sV - \28^ - 12^2 + 0) 

+ a^( — s® + 3s^x^ + — 9.s*2 + 9^^ — -f x'’’) 

+ 03^4(28® — Qs'^x^ — + 6 s- x"* — 12.s*2t*‘* + IS-s^x — 2 x^ 


+ 18x'' - 42x3 

4. al(s^ -4s«x2 - 12.s« + 6s\r^ + i2s^x^ + 42s* - 4sV 


+ 12s2x4 - 36 s ^ x ^ - 36,s2 + x« - 12x6 42^4 _ 35^2 4. 9)]^ 

To find the distribution of s we must inte^gratci from — 00 to ^ with respect 
to X. Thus, (8) is obtained. 


( 8 ) 


V-re’' nl + «o«4(2s'‘ - 6.S^) + «a^-s'’ + 


+ + 




14.s'’ + ,s 


105 4 
2 


1(W a , 105 

2 '2 




If w^c retain only two terms of (3), i.e. use 

(9) /(x) = a„<^()(x) + (h^Psix) 

and consider samples of 3 we o])taiu as the corndation surface of x and s 

18^ ,.g-i(3xM.3.=,ra-’ - (-40r’ + 2 Its' - 2-1t) 

V3 L 4 

+ “""i (_84,s® + 525 x's< - 2752x*s^ 

64 

(10) + 576s^ - lOOSx^s' - 288, s' - 5588x« + 270x^ - 1728x') 

+ ^ (28s* - 6189x's* - 28.Cs' - 629x* + 288s* + 1344x* 

64 

+ 4608t's' - 288s' + 729x')l. 



130 


G. A. BAKER 


The distribution of s can be obtained as before. The processes involved in 
obtaining (7) and (10) are so complicated that the general rule for writing the 
distribution of s is not apparent. Also, the relation of the distributions of s to 
the corresponding distributions of the second moments about a fixed point is 
not apparent. 

In summary, the general distributions of the second moments about a fixed 
point of samples from a population represented by a definite number of terms 
of a Gram-Charlier series and the distributions of the standard deviations of 
samples of 2 and 3 from the same type of population arc given and compared. 
No apparent relation exists between them. 



ON THE FINITE DIFFERENCES OF A POLYNOMIAL 
By I. H. Barkey 

In this paper an apparently new and convenient method of finding the suc- 
cessive finite differences of a polynomial is considered. If operationally 

<f.(M -1- riTi) = 4 ,{u) = (1 -I- Ari)'* ^(u) 

then for any polynomial f{z) of degree “n” 
f(x) = poa:" -I- pix"-* + ■ • ■ + Pn 

= poix + a)" -1- gn(.x + a)’*-' (/i» 

E<‘f(x) =: Poix -I- ay -1- piix + a)”-* -f • • • + p™ 

^afix) = (pi - (/ii)(a; -1- a)"-* + {pi - f/w)(x + ay-^ + . . . + (p„ _ ^i„). 
Similarly, if /i(x) = Aaf(x), then 

fiix) = (pi - 9n)(x + 2 o)"-‘ + </22(x 4 - 2 a )''-2 -I- • • • + qtn 
E^fiix) = (pi - (?ii)(x -1- 2o)’'-i 4 - (p2 - qi2)ix -h 2a)"-2 4- ... 4- (p„ - g,„) 

Aaflix) = (P2 - </l2 - 922 )(x + 2 a )"-2 4- . . . 4- (p„ _ 

and so on for the higher orders, since Aa/,_)(x) = A’ fix). In the practical 
application of this method, “a” may be conveniently taken as unity, and an 
abridged form of synthetic divi.sion employed. Thus, if 

fix) = fix'* 4 " 3 ^’ + 7 x^ — 2 x 4- 3 , then 

5 + 3 -h 7 - 2 i 4 - 3 =/ 

- 24- 9 - 11 : 4- 14 

_ 7 4- 16 - 27 ; 

- 12 4- 28 

- 17 

26 - 21 4- 2fi - 11 = fi 
- 41 4- 66 - 77 

- 61 4- 127 

- 81 

60 - 102 4- 66 = /2 

- 162 4- 228 

- 222 

120 - 162 = /s 

- 282 
120 = U 
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As is evident from the darkened numerals, all figures to the right of the dotted 
line are redundant and may be omitted. From the above, 

A/(x) = 20(a: + 1)» - 21(a; + 1)^ + 25(a: + 1) _ 11 

AV(x) = 60(x + 2)2 - ]02(x + 2) 4- 66 

A2/(x) = 120(x + 3) - 162 

A*fix) = 120. 



SOME PRACTICAL INTERPOLATION FORMULAS 

By John L. Roberts 

Sometimes we wish to find by means of interpolation an approximation to a 
particular value of Wx in the interval between the known values, Wq and W\. 
But it also might be desirable in the interval from Wo to Wi to interpolate several 
approximations to Wx at equidistant values of a:. It is very important to know 
that a formula which might be very satisfactory to iiiter|)olate a particular value 
in an interval might seriously fail to be the most satisfactory formula when it 
is desired to interpolate several values in the same interval. The range of this 
paper is so limited that we only wish to find by means of interpolation several 
approximations to the true value of Wx in the interval from Wo to Wi at equidistant 
values of x. 

One way to perform an interpolation of this sort Ls to use osculatory inter- 
polation.^ The real function of oscillatory interpolation is to secure smooth- 
ness at the known points, which are sometimes called pivotal points. By 
roughness is meant that one or more of the successive dc^rivatives are discon- 
tinuous at the pivotal points. Experience proves that the osculatory formulas 
usually secure smoothness either at the expense of labor or by a loss of accuracies 
over the entire range from Wo to Wi. Frequently the function of interpolation 
formulas Ls to save labor. In many cases it appears reasonable to save labor 
by a loss of both smoothness and accuracy. Formulas are herein selected, 
without direct regard for smoothness, so as to secure the best possible compro- 
mise between a maximum of accuracy and a minimum of labor. It appears 
that this results in many cases in a loss of smoothness that is no more objection- 
able than the loss in accuracy. 

The actuarial profession, while trying to perfect their methods of constructing 
mortality tables, have made contributions of a high order of scholarship to the 
theory of osculatory interpolation. But since the statistician, the astronomer, 
the physicist, and other scientists also have occasions to make interpolations, 
it seems to be vf^ry important to discuss the problem of finding the most prac- 
tical methods of interpolation, not only from the special viewpoint of the 
actuary, but also from the general viewpoint of mathematics. 

AWx is called the first difference of w^, and may be defined by AWx = Wx+i — Wx. 

1 Since this paper presupposes certain knowledge on the part of the reader, it may be 
worth while to indicate some sources of this knowledge. The elementary parts of this 
knowledge can be found in any good book on finite differences “Population Statistics 
and Their Compilation” by Hugh H. Wolfenden, published by the Actuarial Society of 
America, contains an excellent summary of osculatory interpolation. This summary 
indicates some valuable sources of information. 
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Second, third, and higher differences are merely successive differences of the 
first. When use is made of central difference interpolation formulas, it is 
convenient to adopt Woolhouse’s notation, which is defined by means of the 
following equations: Aw-2, = a_2, = a. i, Awo = Oi, Aw;i = a2, A^ic_2 = 6-1, 

A*ic_i = 60, A^ico = 61, A®u’_ 2 = c_i, A^ic_i = Cl, A^ic_2 = do, A®ic_8 = /o, 

etc. 

An important family of curves can be represented by 

= Mo + Xtti + x(x - 1)B + ^ ar(a: - l)^x (1) 

Assume Ho = Wo and Awo = A?Co. Then a study of (1) shows that ai, which 
has already been defined, must be a factor in the second term in order that (1) 
may be satisfied when x = 1 . (1) is a third degree equation. However, if 

C = 0 , (1) becomes a second degree equation; if both B = 0 and C = 0 , ( 1 ) 
becomes a first degree equation. In other words, by giving B and C proper 
values, (1) can be made to become many different interpolation formulas. 

For many purposes interpolation by a first degree formula is not sufficiently 
accurate. . We, therefore, might wish to interpolate by either a second or a 
third degree formula. Since it is possible to draw an unlimited number of 
second degree curves or third degree curves between the points Po and Pi, the 
problem of selecting the best second degree interpolation curve and the best 
third degree curve is of great practical importance. 

I 

Suppose that w’_2, tc-i, iCo, Wij w’2, and can be found in a table of values 
of the function w^j and that we wish to find by means of interpolation several 
approximate values of in the interval from wo to icj. These six given values 
of Wx can be used to determine six pivotal points, which determine a fifth degree 
curve. Suppose this curve represents the function v^. Then Wx and would 
have exactly the same values at the six pivotal points, but would have values 
which arc only approximately the same at other points. Using the first six 
terms of the Gauss central difference interpolation formula, we have 

Vx = I’o + a-Oi + ^ x(,x - 1)6„ + (-r + - l)ci 

+ (^ + — l)(x — 2 )do 

+ (a; + 2 )(r + l)r(.r - 1)(3: - 2 )ei. 

It is proper to use in this formula the differences ai, 6o, etc., which have already 
been defined as differences of because these differences are exactly equal to 
the corresponding differences of Suppose Po, P^, Pit and Pi are four points 
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which are determined by Vx- Then B and C can be determined so that (1) will 
represent the curve which can go through these four points. 

Then 






and 


Also 


and 


.1 1/^.4 5 , 7 \ 

Vi - Uo + - a, - + ^c^ - - do - erj . 


Mj = Wo + g ai 




rj = Wo + 


2 1 /. 

3“‘-9V 






Since Wj = vj and we have two equations, which can be solved for B 

and Co 


B = b — ^ d and C = Ci — ^ ei (2) 

I y 

where b and d are defined by 

5 = ^ (6o + bi) and d = ^ (do + di) . 

A study of (1) shows that Wj does not depend upon C because the term con* 
taining C becomes zero when a: = J , and also shows that Ux over the entire range 
from Uo to Ui is more sensitive to errors in B than errors in C. The B in (2) 
usually contains some error because the six terms of the Gauss formula which 
were used in determining B usually produce results which are only approximate. 
Consequently a comparatively large error in C would not produce an important 
error. 

Assume 


B = 6 - AdandC = Cl - Ac,. (3) 

B is the same in both (2) and (3), but C is not the same. The accuracy of 
(2) and the accuracy of (3) do not differ by an important amount. On the 
other hand, if any attempt to apply (2) is compared with the working illustra- 
tions of (3) in this article, it will be found that (2) to an important extent is 
more laborious than (3). Therefore (3) Ls a better compromise between a 
maximum of accuracy and a minimum of labor than (2). For this reason (2) 
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ought not to be regarded as a practical formula. On the other hand (2) because 
of its great accuracy serves as an ideal with which other formulas can be com- 
pared. In other words (2) is of theoretical importance. 

In like manner another interpolation formula can be found if we use the first 
four terms of the Gauss formula to determine Pj. Then 

.1 1 R 

tij = 1/0 + 2 “ g ^ 

and 

t'i = Mo + ^ ai - ^ fbo + ^ Cl) . 


Since = v^, we can solve for B, and C is left arbitrary. If C = 0, we again 
get an excellent compromise between a maximum of accuracy and a minimum 
of labor. The following second degree formula results. 

B =:h and C = 0. (4) 


In order that the value of (3) and (4) may be appreciated, they are herein 
compared with some other formulas which have been of historical importance. 

If the point can first be accurately determined, a second degree curve 
through the points Po, Pj, and Pi would probably give more accurate results 
than such a curve through the points Po, Pi, and P2 because the first three 
points are in a smaller neighborhood; the second curve can be represented by 
the first three terms of the Gregory-Newton interpolation formula. The points 
P_i, Po, Pi, and P2 determine a third degree curve, which can be represented 
by the first four terms of the Gauss central difference formula. It is probable 
that these terms would determine P^ much more accurately than the first three 
terms of the Gregory-Newton formula because the latter is not a central differ- 
ence formula with respect to P| and because four terms usually give more 
accurate results than only three terms. Consequently there is a strong prob- 
ability that (4) is more accurate than the first three terms of the Gregory- 
Newton formula. In like manner (4) is more accurate than the first three terms 
of the Gau.ss formula. It Ls interesting to observe that (4) is the first three 
terms of the Newton-Bessel formula. 


If P = 6 and C = 3ci, 

then (1) is equivalent to Karup's oscillatory interpolation formula in terms of 
differences taken centrally. B is the same in both (4) and Karup^s formula. 
No interpolation formula can be very accurate unless C is about equal to Ci. 
Since, then, the error in C in Karup's formula is about twice as great as the error 
in C in (4), his formula is distinctly less accurate than (4). Since (4) is a second 
degree curve and Karup's formula is a third degree curve, his formula is very 
much more laborious. (4) is extremely accurate for a formula having its labor 
saving properties; for many purposes its roughness and inaccuracy appear to 
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be in about the right proportion. On the other hand Karup’s formula is ex- 
tremely inaccurate for a formula so laborious; its only good point is its smooth- 
ness. 

Changing somewhat the meanhigs of u and (3) may be written 


— 'Un “H XAUn 


+ |x(x - l)j^^ (A^Wn + AhVn-l) - ~ (A*Wn-l + 
+ g x{x l)^a; - ^ A^w„_ 2 ^ . 


then 


du __ 


which is the amount of discontinuity in ^ at Fo. (3) has greater smoothness 

than (4) ; in other words (3) is more like an oscillatory formula. On the other 
hand 

B = b — d and C = ci — ^ ei , (5) 

D o 

which is equivalent to an important oscillatory interpolation formula by Mr. 
Robert Henderson, compares much better with (3) from the viewpoint of labor 
saving and accuracy than Karup's formula docs with (4). 

II 

An excellent formula can be easily spoiled if the method of applying it is not 
practical. Mr. Henderson, in the Transactions of the Actuarial Society of 
America, Vol. IX, applies (5) in such a way that the numtu-ical work is very 
convenient. Some writers seem to have been very careless about this matter. 
A method intended to interpolate several values between wq and wi should 
provide that the end value Wi shall be exactly reproduced if no error is made in 
the computation. In other words a good method should provide a check upon 
the w^ork. At the same time, in order to avoid unnecessary labor, the work 
should not retain unnecessary decimal places or figures. In other words ficti- 
tious accuracy should be avoided. The following w^orking illustrations are in- 
tended to show good methods of application of formulas and to show how much 
labor is necessary in order to apply them ; also the size of the errors can be used 
to illustrate the theory. 
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When (4) is applied at either end of the table, where terms are not available 
for the calculation of the differences required, it should be assumed that the 
fourth differences that cannot be computed vanish and the required differences 
should be filled in consistently with that assumption. AWx represents the first 
differences. But it is convenient to have S represent the first differences in 
such a manner that they are arranged centrally in the working illustration. 
in like manner represents the second differences. The 2 in means is a 
second difference, and does not have the familiar meaning used in algebra. In 
the case of (4), Aux = ai + xB, A^Ux = and the higher differences all equal 
zero. Since we wish in the working illustration of (4) to interpolate four values 
between wo and Wi^ 8 and 6^ are defined by 8ux = Ux +.2 — Ux and = 8ux+ 2 
— It is proved in any good book on finite differences that there are possi- 
bilities that A and 3, which are symbols of operation, can be separated from the 
functions upon which they operate, and they can be treated as if they were 
algebraic numbers. Conseqi^ltly 1 + 6 = (1 + A)*. In other words by means 
of the binomial law 8ux = (.S — .08A*)Wx, where all the terms within the paren- 
thesis are to be considered oper^i^m^|lj^^^k^^z/x == MA^Ux- <s, Sxy and 
^ are defined by s = Sx 3tix, the middle s = 

du,i == .2ai, and = .04J5 = .02(6o + 61 ). We are now in position to apply (4) 
to the case when Wx = (1.04)**. It might prevent confusion if it is stated that 
z and n are related to each other in such a way that we always interpolate 
between wq and Wi. 


n 

(1.04)'* 

S 

S S* 


80 

23.050 

.9218 

.845 


81 

23.9718 

.9603 



82 

24.9321 

.9988 

4.994 

.0385 

83 

25.9309 

1.0373 



84 

26.9682 

1.0758 



85 

28.044 

1.1190 

1.081 


86 

29.1630 

1 . 1670 



87 

30.3300 

1.2150 

6.075 

.0480 

88 

31.5450 

1.2630 



89 

'32.8080 

1.3110 



90 

34.119 

1.3636 

1.317 


91 

35.4826 

1.4210 



92 

36.9036 

1.4784 

7.392 

.0574 

93 

38.3820 

1.5358 



94 

39.9178 

1.5932 



95 

41.511 


1.553 
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Some of the explanation of the application of (4) applies to (3) and does 
not need to be repeated. The method herein used of applying (3) Ls either the 
same as or a development of the Henderson method of applying (5). If it is 
desired to apply (3) at either end of the table, where terms are not available 
for the calculation of the differences required, it can be assumed that the sixth 
differences that can not be computed vanish and the required differences can 
be filled in consistently with that assumption. A study of the theory under- 
lying this assumption shows that it does not result in a true central difference 
formula and that it consequently results usually in some loss of accuracy. In 
the case of (3) before the finding of the differences of (1), it is convenient to 
write it as follows: 


Mi- = Mo + xa, + 2 a;(x - l)(^B + 1 ^ ar(x - l)(x - 2)C . 

Then 

AMx = Ml + + I ^ a-(a; - 1)C , 

= (b + ^ + xC, and A’m* = C . 


Suppose we wish to interpolate four values between tvo and Wi. 6 and 6* 
have already been defined. 2 — Then 1 + ^ = (1 + A)*, 

or dUx = (.2A — .08A“ + .048A'^)w,. Also 6h(x = (.04A^ — .032A^)Ux and bhix = 
.008A®- and are defined by = si = 2 , and = s\ = 5^Ux. The 

first 


The last 



.04(6. -A a). 

.04(6, _ Aa). 


5 

.1852 might be a useful approximation to The remaining should be 

filled in so that they are in arithmetical progression with irregularities at the 
ends. If the irregularities can be distributed equally at both ends, the irregu- 
larities cause an error in C, but none in B. Errors in B are more important 
than those in C. The middle s = 5u a = .2ai — In the following working 
illustration, Wx = sin n. 


C Ir the sixth ^kni^from bottom 
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V 

sin 71 

s 



s* 

-60 

-.86603 

.36603 




-30 

-.50000 

.50000 

. 13397 

- . 13397 


0 

.00000 

.50000 

.00000 

- . 13397 

.00000 

30 

.50000 

.36603 

- . 13397 

- 09809 

.03588 

60 

.86603 

. 13397 

-.23206 



90 

1.00000 






n 

sin 71 


S2 


0 

.00000 

. 104498 

.000000 


6 

. 104498 

.103374 

- 001124 


12 

.207872 

. 101 125 

2249 

-.001125 

18 

308997 

.097751 

3374 


24 

1 

.406748 

93252 

4499 


30 

.50000 


- .005624 



Suppose we wish to interpolate nine valutas between iPn and ivi by the use of 
(3). Then SUi = iir+.i — Wx. i — Su^, arcl fi’/ix = 5 ^Mx+i — 

Consequently 1 + 5 = (1 + A)’'“, or 5 mx = (.lA — .045A^ + .0285A“);/x. Then 
S-u^ = (.OlA* - .009A^)u, and = .OOIA-'. s- = s; = and s’ = s\ = 

6 ’Wi. The first 

s’ = 5 ’m-, = .01 

The last 

• s’ = 6’« „ = .Ol^B + ^ ^ • 

5 m .4 = (.loi — 4s’) — ^ 5 ’m .4 and 5u.i — (.loi — 4s’) + ^ 8 ’« 4 . 
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sin n 

s 



0 

.00000 

52318 

.000000 


3 

.052318 

52179 

- .000139 


6 

. 104497 

51899 

280 


9 

. 156396 

51478 

421 


12 

.207874 

.050916 

562 

-.000141 

15 

.258790 

.050212 

703 


18 

.309002 

49368 

844 


21 

.358370 

48383 

985 


24 

406753 

47257 

1126 


27 

.454010 

45990 

1267 


30 

.50000 


-.001406 


Suppose we 

wish to interpolate five \ allies 

b('tw('en Wo and 

ivi. The first 


^ ^ = 36 (^‘ 


and 


Ou| - i (*. - 8 

^ (^1 Sdh(jc) + 2 • 


In the following working illustration the given valiums of sin n are written cor- 
rect to five decimal places; in other words after ('ach d(*cimal point there arc 
five symbols or digits representing numbers; also each of lh(\se symbols is written 
in the scale of ten. It can be observed that some \'alu(*s of Uxy s, and in 
the working illustration have six symbols to the right of the decimal point, and 
that some values have seven symbols to the right of the decimal point. In all 
cases the sixth symbol to the right of the decimal point is written in the scale 
of ten, and the seventh symbol is written in the scale of six. This procedure 
provides a (’heck by exactly reproducing tci. Also this procedure does not cause 
much fictitious accuracy, and can be quickly used after a little practice. 


n 

sin n 

S 

0 

.00000 

87130 

5 

.0871305 

86479 

10 

. 1736104 

.0851775 

15 

.2587883 

.0832245 

20 

.3420132 

80620 

25 

.4226341 

77365 

% 



30 

.50000 



.000000 

.000651 

1302 

1953 

2604 

3255 


- .000651 


.003906 
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In general if we w^ish to interpolate z — 1 values between wo and Wi when i 
is neither five nor ten, wi can be exactly reproduced if some of the symbols are 
written in the scale of t. If i = 12, it is evident that we need two extra symbols, 
say t and e, to stand for ten and eleven respectively. If we wish to interpolate 
z — 1 values betw^een Wo and ivi by the use of (4), in the computation each of 
$ and except the given values should <‘ontain one more symbol than each 
given value contains, and the extra symbol should be written in the scale of z. 



ON EVALUATING A COEFFICIENT OF PARTIAL CORRELATION 

By Grace Strecker 


It is to be shown here that when the multiple correlation coefficient Rn, 12 . • • (n~i> 
is found by the method of Horst^ the partial correlation coefficient Rnin-iy, 12 - • (n-2) 
can be found in terms of the / 3 \s. If we are interested only in evaluating a 
partial correlation between two A^ariables, we may also employ the niethod which 
will be given here. 

Without loss of generality the dependent variables may be chosen to be the 
nth and (n — l)st. The coefficient of partial correlation as given by Rietz- 
may be expresscid in the following form : 


( 1 ) 


R 


n(n-l); 12 • • . (n-2) 



/f(n~l) (n-l) 

R(n— l)(n— l)rm Run 

R(n ^l) (n-\) 

(n~l) (n— l)nn 


Rf^n-i)(ri-\) niRy be treated as a new determinant R'. Regarding its elements 
as the coefficients of a s(*t of normal equations (n — 1 in all) whose constant 
terms are zero, we may follow through the Doolittle elimination process. For 
the case where n = 4 we have the table given below. 

In comparing this outline with the one illustrating the Doolittle elimination 
process for R when n = 4 we sec that 


/ ^-^1122 

722 = 722 - 


733 


33 — <^33 — t3 — ^’44 "" 


Therefore, we have 


All rAii2^ 

W ' Wa 


-•(ail - 

\ 2 - 

= ri ~ s • 


1 Horst Paul, A Short Method for Solving for a Coefficient of Multiple Correlation, An- 
nals of Mathematical Statistics, Vol. III> No. 1, Feb. 1932, pp. 40~44, 

*Rietz, H. L., Mathematical Statistics, p. 101. 
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Reciprocal 

1 

2 

3 

a 

|8 

7 

a 

TP 

-4.11 

^12 

-4 14 

f 


/ 


A\\ 

Ip 


IP 

Ofi 


7i 



-1 

A\i 

Au 




t' 


An 

An 







/1 22 

/1 24 

/ 

^2 






IP 

TP 






A “ 

A 12^1 14 







IPAn 

R^An 






^ 1/1 1122 

^l^lll24 



/ 

72 


-4 1122 


R^An 

R'^An 







-1 

Aim 




5 ' 



-4 1123 




W 2 




^[44 

/ 







-1 14 

R^^An 


I?;, 






R’^ An Aim 








- 1 : /3:, 

y 



/ 

74 





-1 






In the general case: 


Til = Til , 
/ 

72 2 = 722 ) 


7(n— 2)(fi— 2) 7{n— 2)(n-2)) 

7(n— l)(n— 1) = ^nn • 
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Hence 


n-~2 / n— 1 \ 

1 )(n— 1 ) = yuy^nn — ^ ^ 

n n— 2 

Since R = n 7»i, then /^(n-l)(n— l)nn = n 7 „, from which we feee that 

n— 2 / n-- 1 \ 

> IT Til ( «nn ~ 0tnj 

Kn— l)(n— 1) 1 \ 2 / 


■K(n— l)(n-l)nn 

But since ann = 1, then 


It has been shown that 


if 7.. 

1 


n-1 

= O^nn — ^ ' ^t/t • 


i?(„ 




(n— l) (n — l)nn 


= 1-2 I3.n. 


Substituting the above values for 


^(n i)(n 1 ) ocjuation (1), we have 

■*tnn 


^n(n-l). 12- . • (n-2) = 


/?(» — 1) (rt - l)nn 
/l - 2 |9.n - (^1 - E ^f.n) 

1 - E 


or 


y^nCn-l); 12 • • • (n-2) = 


n^l 

1 - 


Henct* it Ls seen that when the iS\s given ])y Horst (page 42) are calculated, 
it is an easy matter to solve for the partial correlation yy„(a-i). 12 ... (n- 2 ) . 


St. Louis University, 



A THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 

AND CHECK LISTS' 

By Lee Byrne 

Visiting Professor of Secondary Education, New York University 

Part I. Research Products Which May Be Classified as Derivative 
Specifications and Check Lists 

Meaning of Specification 

In specification something is assigned a specific character. The something 
to be thus assigned a specific character may be called the specificandum. The 
specific character assigned to the specificandum, or (as a wsecond meaning) the 
act of so doing, may be called the specification. 

A proposition is the smallest unit in which it is possible to embody a complete 
thought and is ordinarily represented by a single sentence. In specification 
the characterization may be confined to a single proposition or it may be ex- 
tended to include an indefinitely large number of propositions. So a speci- 
fication may be embodied in a sentence, a paragraph, a chapter, or a whole book. 
No matter how far it Ls extended it will never give complete determination, as 
our knowledge cannot be made exhaustive or our control be given an absolute 
precision. 

In view of the meaning assigned to specification it Ls evident that very many 
books and monographs could in this sense be classified as specifications. 

Meaning of Derivative Specification 

There is a type of specification (book or monograph) which is developed by 
deriving it from a group or class of specifications which already exist. This 
class may be a total class of all such specifications, or a group of those accepted 
as authoritative, or a group of those taken to be representative. A specification 
derived in this manner may be called a derivative specification. As an example 
we could take almost any first-class work by a present-day historian; by his- 
torians it would be called ‘‘secondary^’ because it is based on study of pre- 
existent documents called ^‘primary sources.” 

Meaning of Check List 

The act of deriving a product from a pre-existent v«et of documents may, as 
we have seen, take the form of a derivative specification, embracing an as- 

^ This paper is an amplification of a report made in the statistical section of the Ameri- 
can Educational Research Association at its meeting in February, 1931 . 

146 



THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 


147 


semblage of determinates or determinations. On the other hand the product 
derived may be intended merely to indicate the ground covered or to be covered 
by determination, without actually selecting the particular determinations. 
Such a product will be called a check list. The term is not a very happy one, 
but it is in very common use. If we think of a specification as an assemblage 
of determinations then a check list could be thought of as a corresponding set 
of determinables.2 Since any determinable is capable of an indefinite number 
of determinations it is evident that a long check list could give rise to an ex- 
tremely large number of different specifications, of which, of course, some frac- 
tion might prove undesirable, inadmissible, or false. 

Modes of Specification: How We Specify 

If we examine any specification to see how the specifying is done w'e shall 
find that it ultimately takes the form of specification under aspects. The fol- 
lowing diagram indicates the principal (perhaps all the) possibilities in the way 
of specification. 

Naming the original or main specificandum 
Naming an aspect 

Characterization of the specificandum under the aspect named 


Naming a relation (includes process, operation etc.) 
Naming an aspect of the relation 

Characterization of the relation under aspect named 


Naming a relatum or thing related (a new specificandum) 
Naming an aspect of the relatum 

Characterization of relatum under aspect named 


Naming a part 

Naming an aspect of the part 

Characterization of the part under aspect named 


(The naming of aspects may be merely implicit but it is always present in 
principle.) 

* On the notion of the ^‘determinable,'^ which is due to W. E. Johnson, see his Logic, 
Cambridge University Press (1921), Part I, p. xxxv and Chapter XI. 
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Thus it appears that if specification is pressed far ciiough it always ultimately 
becomes specification under aspects. Aspect and determinable may be re- 
garded as synonyms. 

Current Examples of Derivative Specifications and Check Lists 

At the present time it will be found that we have very many products of 
research which take forms capable of being classified as some kind of derivative 
specification or (derivative) check list in the senses in which these expressions 
have been explained. 

I have distinguished more than twenty different logical types of derivative 
specification or chock list which are exemplified in the current literature of 
educational research and related subjects. However space will not permit 
exhibition of examples of these different types. 

Part II. Validation of Derivative Specifications and Check Lists 

Many research i)rodiicts may be classified as derivative specifications or check 
lists, derivative in the vsense that they have been derived from a group of docu- 
ments (books, articles, journals, newspapers, courses of study, etc.) through 
analysis of their content. Such source documents themselves we shall call 
specifications or groups of specifications. 

The only validation problem raised here is the question whether the resulting 
check list or derivative specification truly represents the class of source specifi- 
cations used. The further question whether the class of source specifications 
itself constitutes a satisfactory source is not discuss(Ml. 

From this point of view, if a check list or derivative specification is based in 
some suitable manner on all the documents of the class represented, no real 
validation problem arises; the validity has to be regarded as perfect. 

It may often happen that the investigator does not wish to analyse all of 
the specifications of the class in question but prefers to save time and labor by 
confining his analysis to a sch'ct group drawn from tlie total class as a sample. 
In this case the problem arises as to how far nvsults based on such sample should 
be judged to be truly r(q)resentative of the entire class of specifications (most 
of which have not be(?n analys(‘d). A problem of this nature may be called the 
problem of validity for this kind of work. 

Such a validation problem appears to take the same form whether the product 
to be validated is a derivative specification or (derivative) check list. Accord- 
ingly we shall for the sake of brevity carry on the discussion by referring to the 
problem as that of validating (derivative) check lists. The same principles 
would apply if the product happened to be a derivative specification. 

In order to consider the validity of a check list based on a sample group of 
specifications (called here a Sample Check List) we may hypothesize a check 
list based in the same manner on the entire class of specifications from which 
the sample was drawn. Such a hypothetical check list (which is not made) 
will be called the Ideal Check List. Then the problem of validity may be con- 
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ceived as the question as to how far the content of the Sample Check List agrees 
with the unknown content of the Ideal Check List. 

An overlapping of the two appears ordinarily to be certain but a failure of 
complete coincidence is very highly probable. The question is what degree of 
coincidence Ls to be expected. 

This general validity problem naturally divides into two separate questions. 
The first question asks what proportion of the content of the Sample Check 
List may be expected to be present also in the Ideal Check List; this may be 
called the (sub-) problem of reliability. The vseeond question asks what propor- 
tion of the content of the Ideal Check List may be expected to be present in the 
Sample Check List; this may be called the (sub-) problem of completeness. 
The answers to these two problems, if expressed in numerical percentages, could 
be called the Index of Reliability and Index of Completeness respectively. 

We shall first consider these two problems in their simplest form and after- 
ward in a more complex form in which they exhibited themselves in a recent 
study by the writer.^ The simple ease presents no great difficulty and it is 
possible that a different method of disposing of it might be preferred. The more 
complex case, however, appears to be rather difficult of solution and the writer 
has not been able to find in the literature any developed technique for handling 
it. The simple case is presented here primarily because it affords, by further 
extension, a successful approach to the difficult problem of the more com- 
plex case. 


Simple Case 

Terms and Symbols 

The ‘‘class of spc'cifications” will be understood 1o consist of all specifications 
which belong to the whole class of specifications regarded as a source, a class 
which we claim to represent in our final product. In this problem the “class*^ 
will not be regarded as indefinitely large but as consisting of a definite number 
of specifications, a number to be ascertained by actual count or by careful 
estimate. 

“Sample specifications” are the limited group selected from the class for 
purposes of actual analysis, and which play the role of representing the whole 
class. The remaining specifications of the class are not analyzed. 

“Sample Check Inst Material” is a name for the assemblage of all the different 
items found in one or more sample specifications. 

“Ideal Check List Material” is a name for a hypothetical assemblage of all 
the different items found in one or more specifications in the class. Only those 
appearing in some sample specifications can be actually known, the rest are 
hypothetical. 

3 Byrne, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931. 
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Write 

M (constant) = total number of specifications in class 
N (variable) = number of these specifications in which a particular item 
under consideration appears (this number is hypothetical and some 
of the particular items themselves are hypothetical) 
m (constant) = number of sample specifications 

n (variable) = number of sample specifications in which a particular 
(the same) item appears 

Values of n may be expected to vary for different items, from m to 0 by inter- 
vals of 1, the zero vahui appertaining to any item wholly absent from the Sample 
Check List Material (hypoth(dically present in Idc'al Check List Material). 

Values of N might bo ('xpected to vary, for different items, from Af to 1 by 
intervals of 1. But in this problem the <‘onvention will be adopted that the 

M 

range is from M downward by intc^rvals of — . Thus if the number M should 

rn 

be five times as largt' as the number m then the range for N would be treated 
as proceeding from M downward by intervals of 5: A/, M — 5, Af — 10, • • • 5. 

A tabulation’^ will mean a statistical tabk' showing how many different 
items appear in every possible numb(*r of specifications. A tabulation must be 
made by actual count for the items of the sample specifications, and will show 
the number of items having each possible value of n, A similar tabulation is 
hypothcti(*al for the items in all the specifications of the class, that is for the 
number of items having each value of N permitted by the convention of the 
last paragraph. 

^‘Tabulation cell” (or simply *^eell”) will mean, as needed, either the number 
of items or the group of items appearing in any designated number of specifi- 
cations. For Sample Check List Material it will be the number or group of 
items to which a particular value of n appertains; for Ideal C'heck List similarly 
the number of items or group of items to which a particular value of N appertains 
(hypothetically). 

^‘Sample Check List” will mean a list of items selected from the Sample 
Check List Material according to some adopted criterion. For illustrative 
purposes we shall consider this criterion to be, for example, the numerical 

.. . m 

ratio n ^ . 

“Ideal Check List” will mean a list of items selected from the Ideal Check 
List Material according to some adopted criterion. For illustrative purposes 

M 

we shall consider this criterion to be the numerical ratio N ^ 

Problem of Reliability 

The problem of reliability may be restated and renamed the General Reli- 
ability Problem. This may be broken up into a group of problems which will 
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be called Elementary Reliability Problems. Each of the latter may be in turn 
broken up into a group of problems which will be called Ultimate Reliability 
Problems. Each Ultimate Reliability Problem may be solved directly. Com- 
bination of these solutions will yield solutions of the Elementary Reliability 
Problems. Combinations of the latter solutions will finally yield the solution 
of the General Reliability Problem. 

These problems will now be stated 

General Reliability Problem : What proportion of the items present in Sample 
Check List may be expected to be present also in Ideal Check List? 

Elementary Reliability Problem: What proportion of the items in a particular 
cell in Sample Clu'ck List may be expected to be present also in ld('al C^heck 
List? 

Ultimate Reliability Problem: What pro[)ortion of the items in a particular 
cell in Sample Check List may be expected to be i)resent also in some designated 
cell in Ideal (-heck List? 

To solve an Ultimate Problem: 

From the Fundamental Theorem in the Theory of Inductive Probability 
(Whittak(^r, E. T. and Robinson, G. The C^ilculus of Observations. London: 
Blackie & Son. 1924. p. 305) the solution may ho expresscnl as 

Ph'V^ 

P p ‘ 

Whittaker and Ro])insoids statement of the l'\mdamental 1'heorem in the 
Theory of Inductiv(‘ Probability is as follows (form slightly (*hang('d without 
change in meaning) : 

^ ^Suppose that a certain observed phenomenon may be accounted for by any 
on(‘ of a certain number of hypotheses, of which one, and not more than one, 
must be true: suppose mon^over that the probability of the /^-th hypothesis, 
as based on information in our possession before the phenomenon is observed, 
is Prj while the probability of the observed phenomenon, on the assumption of 
the truth of the /^-th hypothesis, is Then when the observation of the 
phenomenon is taken into consideration, the probability of the R-ih hypothesis is 

XPp 

where the symbol 2) denotes the summation over all the hypotheses. 

It is clear that an Ultimate Reliability Problem is a case falling under this 
Fundamental Theorem. The observed phenomenon is any item occurring in 
any specified cell of Sample Check List, say cell n = s. It may be accounted 
for by a certain number of hypotheses as to its source in the Ideal Check List 

* For the fundamental position of this theorem in a theory of science and for its proof 
one may also consult Jeffreys, H . Scientific Inference. Cambridge : Cambridge University 
Press. 1931. Chapter II (section 2.34). 
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Material; the different cells in the Ideal Check List Material arc these different 
hypotheses of origin, hypothetical because we do not know from which one it 
has come but only that it must have come from some one of them ; the cell from 
which it actually comes is the true hypothesis, though we do not know which 
one that is. That the origin of the item is in cell N = Ris the R-th hypothesis, 
and its probability is written The probability of the occurrence of the 
phenomenon on the assumption of the truth of the 7^-th hypothesis is the prob- 
ability that an item in cell N = R will appear in Sample Check List in cell n = s 
and its probability is written p„. As we clearly have in our Ultimate Reliability 
Problem a case falling under the Fundamental I'heorem quoted we may accept 
as the required solution of the Ultimate Reliability Problem the formula already 
given in the initial statement : 

Ph-P. 
xpp ■ 

This expresses the probability that any item found in Sample-Check-List cell 
n = s comes from (and appears in) Ideal-C'heck-List-Material cell N = Ry or 
it gives the proportion of items found in Sample-Check-List cell n = s that 
may be expected to come from (or appear in) Ideal-Check-Llst-Matcrial cell 
N = R, 

Meaning of any value of P (say Ph) = the [)robability that any item, drawn 
at random from those cells of Ideal Check List Material which are possible 
sources of items in Sample-Check-List cell n = s, will happen to be drawn from 
cell A = 72. 

Meaning of any value of p (say p^) = the probability that any item in Ideal- 
C'heck-List cell N = R will also be i)resent in Sample-Check-List cell n = s. 
(Important: this supposition is not equivalent to its converse.) 

Evaluation of PrI 

p ^ number of items in cell N = R 

“ number of items in all cells which are possible sources of items in cell a = 5 ’ 

For this ratio it is necessary to assume that the shape of the numerical curve 
formed by the group of Ideal-Check-List-Material cells is the same as that of 
the numerical curve formed by the group of Sample-Check-lnst-Material cells. 
On this assumption we may replace the numerator by the number of items in 
the Sample-Check-List-Material cell having an abscissa corresponding to that 
of the Ideal-Cl)cck-List-Material cell N = R, and replace the denominator by 
the sum of the Tuimbers of items in all the cells with abscissae corresponding to 
those of Ideal-Check-List-Material cells which are possible sources of items in 
cell n = s. 

Evaluation of 

By the aid of “the definition of probability which is used in practically all 
treatises on the subject*^ (Coolidge, J. L. An Introduction to Mathematical 
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Probability. Oxford: Oxford University Press. 1925. p. 4) and the principle 
underlying the Theory of Combinations (Whitworth, W. A. Choice and Chance. 
New York: G. E. Stechert & Co. 1927. Proposition II) we are able to arrive 
at the evaluation: 


in which, for any p (say p^), we employ for N the value N = /f, and for n the 
value n = s. As the denominator later cancels out it may he disregarded 
throughout, simplifying the formula to 

P — n L” H • 

(A symbol such as is read “the number of combinations of N things taken 
n at a time”; also written in several other forms.) 

The definition referred to may be worded as follows (Coolidge^s own preferred 
definition is not quite the same) : 

“An event can happen in a certain number of ways, which arc all equally 
likely. A certain proportion of these are classed as favorable. The ratio of 
the number of favorable ways to the total number is called the probability that 
the event will turn out favorably.” 

The principle underlying the Theory of Combinations may be quoted from 
Whitworth as follows (also found in ordinary works on algebra) : 

“If one operation can be performed in ni ways, and then a second can be per- 
formed in n ways, and then a third in r ways, (and so on), th(^ number of ways 
of performing all the operations will be m X n X r X etc.” 

If it is not at once clear that the formula for evaluation of p follows from the 
definition and principle just quoted, the following considerations should make 
it evident. 

We are working in terms of a particular item b(4onging to a particular Ideal- 
C'heck-List- Material cell, say cell N = II, “Favorable” occurrence requires 
that this item fall in a i)articular Sample-Check-List cell, say n = s, while 
falling in any other Sample-Check-List- Material cell (including cell n = 0 for 
absence) is “unfavorable.” Again the real m(‘aning of the “favorable” occur- 
rence is that the item will be found in just n = out of the m specifications of 
the sami)le, and absent in the remaining m — n specifications of the sample. 
Mon'over presence in Ideal-Check-List-Material cell N = R means that the 
item occurs in just N = R oi the M specifications that constitute the whole 
class and is absent in Af — N of these specifications. The total number of all 
the ways (favourable and unfavorable) in which our event can happen means the 
same as the total number of all the ways in which a group of m specifications 
can be selected from a larger group of M, and this is, of course, written and 
given us in our denominator. The number of favorable ways in which our 
event can happen means the same as the number of ways in which N specifi- 
cations containing the item can form groups of n specifications while at the 
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same time M -- N specifications not containing the item can form groups of 
m -- n specifications; the first distribution can be done in ways and the 
second in C^In ways, so by Whitworth’s principle the number of ways which 
these things can happen simultaneously is Assembling numerator 

and denominator we have the formula initially stated for evaluation of p, viz.: 

riM-N fiS 

V = — • 

This is the general formula; in applying to the particular example N = R,n ^ s 
the replacements for N and of course, give 

^ »u— «f 

. 

^ m 

Having a means of evaluating P and p we may solve all needed Ultimate 
Problems. The resulting solutions of the needed Ultimate Reliability Problems 
(not necessarily complett'd) enables us to arrive at the solution of any needed 
Elementary Reliability Problem in (he form of a percentage which may be 
called an Index of R(‘liability for the Sample-Che(;k-l^ist ceil in question. In 
computing this percentage we distinguish source-cells that belong to the Ideal 
('heck List from other source-cells that belong to the Ideal ('heck List Material 
but not to the Ideal (.-heck List. 

By properly averaging cell-Indic(\s of Reliability (which are really Indices of 
Reliability for the individual items in the cells) may obtain a solution of the 
General Problem of Reliability in the form of an Average Index of Reliability 
for the Sample Check List as a w hole. 

In addition to the Average Ind(*x of Reliability for the Sample Check List 
W'e may easily secure also Average Indices of Reliability for any series of brieh^r 
Sample Cluick I^ists selected from the Sample Check List, by properly averaging 
the Indices of cells contained in any Sample Check List in question, keeping 
the original criterion for Id(‘al Check List. 

In practice it may not be necessary to compute all cell- Indices, as a portion 
of these may be entered in tabl(\s by any methods of interpolation regarded as 
acceptable. 

Problem of Completeness 

Again we have General, Ekraentary, and Ultimate Probkmis. These may 
be stated as follows : 

General Completeness Problem: What })r()portion of the items present in 
Ideal Check List may be expected to be present also in Sample Check List? 

Elementary Completeness Problem : What proportion of the items present in 
Ideal Check List may be expected to be present also in some designated cell in 
Sample (^heck List? 

Ultimate Completeness Problem : What proportion of the items in a particu- 
lar cell in Ideal Check List may be expected to be present also in some designated 
cell in Sample Check List? 
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To solve an Ultimate Problem : 

From principles already used the proportion to be expected is the same as the 
value of p alone in an Ultimate Reliability Problem, viz. : 


pM—N 



By the use of this formula we may solve the Ultimate Problems for all values 
of N represented in Ideal Cheek List and all values of n represented in Sample 
Check List; some of these solutions will have a value of zero. 

For each value of ??, if we properly average the solutions of the Ultimate 
Problems, we obtain a solution of the Elementary Problem for one Sample- 
Check-LLst ceil in the form of a percentage which may be called the Index of 
Completeness for the particular Sample-Check-List cell. In securing this 
average it is necessary to multiply each Ultimate Problem solution by a relative 
number corresponding to the* assumed ratio of number of items in the particular 
Ideal-Check-List cell to the number of items in all the Ideal-Check-List cells. 
The source of the assumed relative* numbers is the same as that used in evaluat- 
ing P in the Reliability Problem. 

When we have an Index of (kmipleteness for each Sample-C-heck-List cell 
we may obtain a Total Index of Completeness for the Sample ('heck List as a 
whole by summing the cell-indices of Completeness of all the cells of the Sample 
Check List. By an cquival(*nt but preferable m(*thod we may divide the last- 
named result by the sum of the cell-indices of Completeness of all the cells of 
the Sample Check liist Material (including cell n = 0); by this method the 
of the original formula cancels out and so may be disregarded throughout. 

A Total Index of Completeness is similarly obtainable for a Sample Check 
List (any Sample ('heck List selected from the Sample C-heck List) by summing 
the cell-indices of Completeness of the appropriate c(*lls. Tlius, if desired, a 
tabulation may be made showing Indices of Completeness for a series of Sample 
Check lasts differing in ext(*nt. 

A combined tabulation may show for each of a series of Sample Check Lists 
its Index of Reliability and its Index of Completeness. 

More Complex Case 

So far we have considered a validation problem of simple type. In the writcr^s 
Check List Materials for Public School Building Specifications® a more complex 
problem was presented, due to the introduction of the concept of the Applicable 
Case. A Check List for School Building Specifications was developed with a 
view to its use by school officials or others as an aid in judging proposed school 
building specifications with reference to their completeness or incompleteness 
of determination. The position was taken that a new specification ought not 
to be charged with the omission of a given item unless the building (as repre- 

® Byrne, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931 . 



156 


LEE BYRNE 


Rented by the specification) had an Applicable Case for that item. To give a 
single example, the Check List contains various items relating to the specifying 
of marble work. It did not seem appropriate to score a specification down for 
the omission of numerous determinations in marble work, if in fact there was no 
marble in the building to be determined. This situation is expressed by saying 
that there are no Applicable (^avses for those items. 

It seems likely that there are other research problems in which the question 
ought to be raised whether adequate treatment does not require the introduction 
of the concept of the Applicable Case. If so a more difficult validation problem 
is presented than would otherwise be the case. 

In the more complex case indicated solution is obtained by making the neces- 
sary extensions in the procedures followed for the simple case. 


M (constant) 
D (variable) 

N (variable) 

m (constant) 
d (variable) 

n (variable) 


Modifications in Terms and Symbols 

total number of specifications in class 

number of these specifications containing an Applicable Case 
for a particular item 

number of the latter specifications wffiich also contain the 
particular item 

number of specifications in vsample 

number of these specifications containing an Applicable Case 
for the particular item 

number of the latter specifications which also contain the 
particular item 


Values of d range from m to 0 by intervals of 1, and thosc^ of n range from d 
to 0 by intervals of 1. 

The convention is adopted that values of D range from M downward, and 

those of N from D downward, by intervals of — . 

m 

(Tabulation) cell will mean the number of items (or the group of items) having 
a common value of d and a common value of n. 

The criterion for membership in the Sample CUieck List may, for illustrative 

purposes, be taken as n ^ ^ . 

The criterion for membersliip in the Ideal Check List may, for illustrative 
purposes, be taken as iV ^ 


Problem of Reliability 

Following the same principle and line of reasoning as for the simple case we 
arrive at the same general formula for the solution of an Ultimate Reliability 
Problem, viz. : 

PhP. 

SPp' 
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Meanings of values of P and p are the same as before ex(?ept that cells must be 
described respectively in terms of n and d values instead of values alone, or 
N and D values instead of N values alone. 

Pr is evaluated in the same manner as before, using the new meaning of 
‘cell.^^ 

For p, the evaluation now becomes 

r^M—D /yD- N /yN 
^ /yM 

which through cancellation may be simplified to the working formula 

P — d—n^ n* 

The reasoning leading to the denominator is unchanged and so this de- 
nominator itself remains unchanged. The numerator for the evaluation of p is 
altered to the extent shown by the consideration that, in producing ^^favorable’^ 
ways, we now have to do with the number of simultaneous possibilities of draw- 
ing n specifications from a group of N specifications containing a particular item, 
drawing d — n specifications from a group of D — N specifications which con- 
tain an Applicable Case for this particular item but do not contain this item 
itself, and of drawing m — d specifications from a group of M — 1) specifications 
which contain no Applicable Case for the item. 

Prol)lcm of Completeness 

Following the same principles and line of reasoning as for the simple case we 
arrive at the following formula for the solution of an Ultimate Completeness 
Problem : 

flM~D flD—N r>lN 
^m~d d -n 

By suitable treatment bringing about cancellations the working formula may 
be reduced to 

fyM—D (yD-N ryN 
— d ^ d — n L' n 

Techniques and Aids in Computation 

The present paper Is limited to an attempt to exidain with adequate fullness 
the proposed theory of validation for derivative specifications and check lists, 
and space is lacking in which to exhibit techniques of actual computation. One 
specimen problem worked out in fairly complete detail, together with remarks 
on available aids in computation will be found in Appendix A3 in typewritten 
copies of the writer\s ‘^Check List Materials for Public School Building Specifi- 
cations^' on file in the Library of Teachers College, Columbia University; the 
Appendices are not included in the printed edition. 



A NOTE ON SHEPPARD’S CORRECTIONS 

By Solomon Kullback 

In this note we shall derive a simple relation between the characteristic 
function of the grouped distribution and the characteristic function of the 
original continuous distribution, assuming that the frequency curve has high 
contact with the x-axis at both ends. 

2 

If we set p, = / f{x) dx , then the characteristic function of the 

grouped distribution is given by 

( 1 ) ^(0 = 

where t = V^— 1. Replacing p, by its value as given above, we have 


( 2 ) 


r X + 

^(0 = 2 I 

^ fix + X.) dx 
2 

= dx + x.) 

•'"I 

W 

j‘‘ e-'>^dx 


There is no difficulty about justifying the inversion of the order of integration 
and summation. 

Because of the assumption of high-contact with the axis of x at both ends of 
the frequency curve, we have 

(3) ip{t) = / e^^^fix) dx = 

so that 


^(0=|sinf.(0. 

168 


( 4 ) 
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This is the desired result, from which there follows the desired moment 
relations by equating coefficients of {ity on both sides of the equation. For 
example : 


1 + Ma + (/O’ + If (il)’ + 


= (> + 


(I'O^ Ml® 1 (/<)■* W’* 1 , 

3! ■* 5! 


^1 + »),// + ^ 00’ + • ■ 

- 1 + (™. + 10 + ^ 3 ?' + -^ ) + 


or 


Ml = 7ni ; Mi = m2 + ~ ^^3 H y— I 


Washington, D. C. 



THE LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS^ 


By J. L. Doob 

There have been many advances in the theory of probability in recent years, 
especially relating to its mathematical basis. Unfortunately, there appears to 
be no sour(!e readily available to the ordinary American sbitistician which 
sketches these results and shows th(dr apj)li(;ation to statistics. It is the purpose 
of this })aj)er to define the basic concepts and state th(' basic theorems of prob- 
ability, and then, as an application, to find tla^ limiting distributions for large 
samples of a large class of statistics. (.)ne of these statistics is the tetrad differ- 
ence, which has been of much concern to psychologists. 

I 

T^et PXx) be a monotoiK' non-decreasing function, (‘ontinuous on the left, 
defined at ('very jioint of the T-axis, and satisfying the conditions 

(1) lim F(^) = 0 , lim Fix) = 1 . 

X— + — 00 X ->00 

Then th(‘ function Fix) is said to b(' th(' distribution function of a chance variable 
X, and P\x) is said to be th(' probability that x < x. Tlu^ curve y — Fix) is 
sometimes called the ogiv(‘. in statistics. The chances \'ariable x itself is merely 
the function a*, tak('n in conjunction with the monotone function Fix). 

If / xdP\x) exists as an absolutely conv(^rg('nt Stieltjes integral, the value 

of the int(*gral is calk'd the ('xjx'ctation of x, and will Ix^ denoted by Flix). 

II 

Let F(x\, • • • , Xn) b(^ a function defint'd over /i-dinu'nsional spaci', -which is 
monotone, non-decreasing, continuous on th(' left in ('ach coordinates if the others 
are held fast, and which satisfic's the conditions 

(2) lim F{xi, • • • , j'n) = 0, j = 1, . . • , n, lim F{xi, ■ ■ ■ ,x„) = 1 

Xj-*-oo XJ,-- ,Xn-»« 

where in the last limit, Xi, • • • , Ix'come infinite' together. Then P\xiy • • • , x„) 
is said to be the distribution functiem of a set of e'hance variables Xj, • • • , Xn, 
and Fixij • • • , x,,) is said to be' the probability that all the inequalities x^ < Xj , 
(j = 1, ... , 7i)y hold simultaneously. It can be shown that the function 
P'jix) = lim (fi, • • • .r, i) is of the type discussed in §1. The 

^ 1. ,in l-*oo 

* Research under a Rrant-in-aid from the Carnegie Corporation. 

1(50 
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function Fj{x) is called the distribution function of Xj. The chance variables 

H 

Xi, • • • , Xn are called independent if F(xi, • • • , a*n) =11 The chance 

1 

variables Xi, • • • , Xn are merely the functions • • • , defined ovin* n- 
dimcnsional sjiace, taken in conjunction with the function F(xif • • • ,rr„). 

If ai, • • • , an are any real numbers, the number F(ai, • • • , a„), the prob- 
ability tliat x^ < tty, ^ ■ = 1, • • • , n, is also called the probability that a sample 
(^ 1 , • ‘ , ^n) shall b(' in the region of M-dimensional space determined by 
j = • * * > Thus regions of this special type have probabilities 

attached to them. Using the usual additivity rules, probabilities can be at- 
tached to more general rt'gions, and in fact probability can be defined on a col- 
lection C of regions im^luding all open sets, closed sets and all sets which can 
be obtained from them by repeatedly taking sums, products, and complements. 
(Such point sets are called Borel measurable). The resulting function of point 
sets is non-negative and completely additive.^ 

ff f{xiy • • • , .r„) is any function of Ti, • • • , let be th(' set of points 
(*^* 1 , • • • , ^n) when^ / < X. Suppose that is in the colh^ction C for all values 
of j, and let F{x) be the pro))al)ility attached to the set Ex- Thi'ii it is readily 
S(um that F{x) has the y)ro})erti(‘s discussed in §I and is therefore the distribution 
function of a new chance variable x, which will b(i denoted by/(xi, • • • , Xn). 
Th(‘ chanc(^ variable /(xi, • • • , xJ is merely the fiuurtion • • • , Xn) taken 
in conjunction with th(' distribution function F(^o * * * > (An example is 
f(X]y • • • , Xn) •= Xi -{- • • • Xnj d(*termining th(' chanci^ variable Xi + • • ' + 
Suppose that K{x) exists, 


CD 


K(x) - 



x<lF{x). 


Th(‘n it can be shown that th(‘ //-dimensional (li(‘besgne)-Stieltj(‘s integral 


/: 

(‘xists and has th(‘ valiu' A’(x). Uonv(‘rsely tin* (*xistenc() of the integral (4) im- 
plies that of (3)- 

If tli(‘n‘ is a Lel)(‘soii('-int(‘^ni))lc function • • ■ , such that 




2 That is, if '/t(E) is the value of the set funetion on the s(‘t F, and if Ei, Eiy 

/ \ 00 

point sets ^\ith no eommon points, and >^hieh are in (\ pf ^ Em j — ^ p(Em)- 

\rn I / Wi= I 
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the function ^ is said to be the density function of the distribution. In this 
case (4) becomes 

/Ul, ••• ,Xn)(p{Xif ,Xn)dXi ••• dXn. 

The proi)ability attached to a point set E in the collection C Ls the integral 
(4) (or (4') if there is a density function), where / = 1 over E and / = 0 else- 
where. 



Ill 

Let X, Xi, X2, • • • be a sequence of chance variables. We suppose that for 
every integer n, x, Xn determine a bivariate distribution. Then it is readily 
seen from §II tliat there is a chance variable | Xn — x | and therefore that 
P||Xn — x\ ^ XPis defined for every number X. If 


( 6 ) 


lim P{|Xn — x|gX}=l 


for every positive number X, the sequence x„ is said to converge stochastically, 
or to converge in probability, to x. If a is a constant, P{ | Xn — a | ^ X} is also 
defined for every number X, and then^ is a corresi)onding definition of stochastic 
convergence to a. The usual th(K)rems about limits hold: if Xn, Yn converge 
stochastically to x, y, Xn + y« converges stochastically to x + y, etc. 

An example of sto(*hastic conv^^rgence is given by the law of large numbers. 
Let X be a chance variable' with distribution function F{x) and suppose that 
E(x)f E{x^) exist, i.e. that 

J°° x(lF{x) , j” xhW{x) 

are absolutc'ly convc'rgc'nt integrals. Lot Xi, • • • , x„ bo chance variablonS whose 

n 

n-variate distribution function is Y[ we are thus supposing that the vari- 

j “ I 

ablcs all have the same distribution and form an independont set. Then 
- y j Xj is a new chance variable, and Tchobychoff's inequality furnishes an 

immediate proof that - ^ x, converges stochastically to P(x).^ 


® Throughout this paper, if 7 represents a set of conditions on ehanee variables, P[ 7 l 
will denote the probability that those conditions are satisfied. 

\ n ^ 

* If =* ' Y' x„ E(x„) — P(x), E(x^n) =* -E(x^). Then if X is any positive num- 
« 1 ^ 

E\\x - J57(x)l*} 

her P|| x« — ^(x) | > X) ^ — — — - which implies ( 6 ). 

7lX* 
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There is also another kind of convergence, called convergence with prf)b- 
ability 1. The sequence fxn) converges with probability 1 to x if 

(7) lim 7^!l x„ - X I g X, 1 x,. 4 i - x | g X, . • • , | x,h„ - x | g X} = 1 

ti — *cc 

for every Vfxlue of p ^ 0, uniformly in p ^ 0 for every positive number X. If 
p = 0 in (7), (7) becomes (6), so that convergence with probability 1 implies 
stochastic convergence. Although the (‘on verse is not true, if }x„) is a seepumee 
of chanc(‘ variabl(^s converging stochastically (o x, there is a subs(T]iu‘nce of 
|x„l which converg(‘s with probability 1 to x.'* The usual limit th(M)r(*ms hold 
here also: if x„, convc'rge with probability 1 to x, y, x„ + y„ conv(‘rg(^s with 
probability 1 to x + y, etc. 

An examples of convergence xvith probability 1 is the following. If in the 
previous example' th(' hyi)oth(‘SLs tbat K(x^) exists is removed, so that only th(‘ 
w(‘ak('r liypothesis of the ('xlsttTice of E{x) is supposed, the Tchcbychc'ff in- 

(‘(juality can no longea* b(* appli<‘d, lait a difPfTiMit nu'thod shows that ^ x, 

j - 1 

converg(‘s with probability 1 (and then^fore stochasticially) to E{x)S' This rc'sult 
is known as th(' strong law of large numbers. 

IV 

L('tx, Xi, X 2 , • • • be a s(^(iuence of chaiK'e variable's with distribution functions 
F{x), Fi(x), F-iix), • • • respe'ctively. Theai if lim F„(x) = F(j:) for cveny value 

n- ** 

of Xf the distribution of Xn is said to converge' to a limiting distribution with 
distribution function F{x). 

As an example, consider the Ijaplace-Liapounoff theorem. Let Xi, X 2 , • • • 
b(' a seepience of indepeTident chance variables (i.e. any finite number of them 
form an indciiendent set) with the same distribution functions, and let E{xn)y 
E{xl) c'xist. We suppose that a- = E[[x„ — E{Xn)P\ > 0 so that the dis- 
tribution of Xn is not merely confiiu'd to one point. Then the distribution of 

(8) w" 2 ^ [x, — /!;(x,)] 


^ Th(‘ theories of prol)ability and of measure are^ fundamentally identical. Chance 
variables correspond to measurable functions. Stochastic convergence corresponds to 
convergence in measure, and convergimct* with probability 1 corresponds to convergence 
almost everywhere. The relation between these two types of convergence is discussed 
(in the terminology of the measure theory) in E. W. Hobson, The The'ory of Functions of 
a Real Variable, second edition Vol. 2, pp. 239-244. 

« Cf . for instance J. L. Doob, Transactions of the American Mathematical Society, 
Vol. 30 (1934), pp. 704-765. 



164 


J. L. DOOB 


converges to a limiting distribution with distribution function^ 

1 /* * -JiL 

(9) —4= / e 

<rv 2t J-» 

The convergence of a sequence of n-variat(* distributions is defin(»d as the 
convergence of the distribution functions just as above for n = 1. Suppose 
that (xii, • • * , Xni), (xi 2 , • • • , Xn 2 ), * * * arc independent sets of chance vari- 
ables (i.e. the distribution function of any finite numbcir of sets is the product 
of the distribution functions of the sets) with the same distribution functions. 
We suppose that E{x]i) exist, j = 1, . . . , n and that = E[[Xj\ — 

E{X]i)]^] > 0. Then if xj,n = m ^ 52 — E(Xjt)]j the 7^-variate distribution 

of Xlm; * • • > X»j»» converges to the normal distribution^ about zero means with 
variances cTi, • • • , cr^ and correlation coefficients {p„} where a^ajPrj = i^{[x,i — 
E{xa)][x,^ ~ £(x,i)]l. 

Three lemmas will bo needed below in applying these concepts. 

IjEMMA 1 . // (Xn) is a sequence of chance variables whose distributions approach 

a limiting distribution and if {y,,) is a sequence of chance variables converging 
stochastically to 0, the sequence {xnYnl converges stochastically to 0. 

For if F{x) is the distribution function of the limiting distribution, and if X, 
y are any positive munb(;rs, 

P{1 x„y„ 1 < /\( S P{| x„y,. I < X, ly,. I ^ /’{Ixnl <X/m, |y,. I^mI 

(10) ^P{ly„|gM! -P{|x„1sx/m 1 =-P{lynl>M! +P{|x„|<x/m} 

^ - P{\y« I > + P{x» < ^/m! - P{x» < — x/2ju! . 

Then, letting n become infinite, 

(11) Urn inf P{| x„y„ [ < X) ^ F(\/y) — F{ — \/2y) y 


Letting fjL approach 0, F(\/y) approaches 1, — X/2)u) approaches 0, and the right 

hand side becomes 1, as was to be proved. 

Lemma 2. Let {xn}, {YtiI, (Zn) be sequences of chance variables such that the 
distribution of Xn approaches a limiting distribution with continuous distribution 
function F(x) and such that the sequences (yn), {Znl converge stochastically to 0, 
1 respectively. Then the distributions of [Xn/Zn\^^ and of Xn + y„ approach limit- 
inq distributions with the same distribution function Fix). 

’ A. Khintchine, Ergebnisse der Mathematik, Vol 2, No. 4: Asynfiptt)tia(*he Gesetze dor 
Wahrscheinlichkeitsrcohnung, pp. 1-8. 

* Ibid. pp. 11-16. 

• If I On} is a sequence of real numbers lim sup Un is defined as lim (least upper bound 

ft “->00 — ►OO 

a„y 1, and lim inf a„ is defined as —lim sup (— a„). A necessary and sufficient 

n— »oo n—*oo 

condition that the sequence {an} converge to a limit a is that lim inf a„ * lim sup a,* = a. 

7i-*ao n— *00 

Since z« converges stochastically to 1, the probability that z,* == 0 approache.s 0. The 
theorem is independent of the way x„/z„ is defined when z« — 0. 
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Xn 1 ~ Z 

Since - = x„ + Xn (neglecting the possibility that Zn may vanish), 

Zn Zn 

where the last term converges stochastically to 0 by Lemma I, it is sufficient 
to i)rove the second part of the theorem. If € > 0, and if x is an arbitrary 
number, 

(12) P{Xn + yn <^1 = P{Xn+yn < .r, | Yn | g e} +P{Xn + yn < .r, | Yn | > t] . 
Since the sequence {y„l converges stochastically to 0, 

(13) lim P{x„ + Yr. < X, 1 2/n 1 > cl ^ lim P{| Yn | > c} =0 

71 -♦oo n —♦00 

SO that in the limit the second term in (12) can be neglected. Moreo\'cr 

(14) P{x„ + y„ < x, I y„ I ^ *1 g F{xn<x + f j . 

If \vc let n become infinite and then let a approach 0, (14) becomes 

(15) lim sup P{x„ + y„ < a;) g Fix ) . 

n-*oo 

A similar argument shows that 

(16) lim inf P{x„ + Yn < x} ^ F(x) , 

n— »oo 

and (15), (16) taken together imply that 

(17) lim P{x,. + y„ < a-) = F(x) , 

n— *00 

as was to be proved. 

Lemma 3. If Xi, X 2 , xa, X 4 are chance variables whose distribution has density 
function 

1 - I (xj+x’-f-x’f xj) 

( 2 ^ 2 " 


the distribution 0 / z = X 1 X 2 — X 8 X 4 has density function \e^' * ! . 

l"hc distribution of u = X1X2 and that of v = — X3X4 have the same density 
function : 



Hence the distribution of z has density function 


(19) 


(x 

' 2 r- 


2r * ' 


2 d\ 


dt dr 
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If we change to polar coordinates: t — r cos 6, r = 'c sin and integrate out X, 
we obtain 


1 

TT 




'2 drde = 



Theorem 1. Let Xi, X 2 , Xa, X 4 determine a A-variate distribution with distri- 
bution function F{xij X2, X3, X4). Suppose that JB(x*), E{x]x]) exist, 

i, j = 1, • • • ,4, and suppose that E(xt) = 0, E{xl) = i, j = 1, 2, 3, 4. 
Lei Xij, X 2 j, Xsj, X4j have the same ^-variate distribution as Xi, X 2 , Xs, X 4 , i = 1, • • • , n, 

n 

and let the \n-variate distribution furwiion of {x,;! be II E(xij, X2j, x-sj, X4j). We 

; = i 

shall use the following notation {which suppresses the dependence on n)\ 

1 « 1 

( 20 ) J., = - V Xu- , s,; = ~ ^ X./,X;A , P.y = 7^(x.x,) . 

= i ”a = i 

Lei (p be a ftmeiion of ft, s,;, defined in a neighborhood N of P: f, = 0, Stj = pi,, 
which, together with its second partial derivatives is continuous in N* Define 
a 0 by 



where the partial derivatives are evaluated at P. Then i/ o- > 0, the distribution of 
y / n [<^ — ip{P)] {where ip has the arguments ft, s,;) converges to a limiting distribu'- 
tion which is normal with mean 0 and variance 

1\) prove tills theorem we expand ip in the neighborhood of P, obtaining 

4 4 

(22) Vn fv. - ^(P)] = 2 ^ - S 

where the partial derivatives are evaluated at P, and where R,t consists of a 
linear combination of y/ nlApii — ^jk)y Vn(pi; — sj {pki — Sa/), with 

coefficients which are uniformly bounded as long as f,, s,, arc in the neighbor- 
hood N. Now 

(23) lim ft = 0 lim s*; = p,, 

n — >00 n ->oo 

with probability 1, by the law of large numbers, and as n becomes infinite the 
distributions of y/ n f i, y/ n (pty — s,,) converge to limiting distributions, by the 


Tho hypothosis that /?(x,) == 0 involvr.s no n*al restrirtion, sinro the KPTU'ral ensp ran 
hv retlucpcl to this oik* by substituting; Xt — A^xJ for x» Tho hypotlipsis that A’(xf) — I 
can be mot by substituting Xtli^(xJ)]''* ^^honevor E{xl) > 0, which will always bo tnn* un- 
less Xx = 0 with probability 1. 
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Laplace-Liapounoff theorem. Then by Lemma 1, the terms of Rn converge 
stochastically to 0. The other terms of \/n[(p — are sums to which the 
Laplacc-Liajxiunoff theorem can be applied, giving the desired conclusion. 

As an example of the application of this theorem, we suppose that ^ is a 
correlation coefficient: 


(24) 


ip = 


Sl2 

(sii S22)* 


(p(P) = pl2- 


Here cr^ is AJ|[xiX 2 — ^-pi 2 (xi + X 2 )Pi, (which reduces to the familiar result 1 — pi 2 
when the bivariate distribution of Xi, X2 is normal) and or = 0 only when, with 
probability 1, 

(25) 2 X 1 X 2 = Pi2(xJ -|- X 2 ). 


As a second examjile \\(‘ suj)j)ose that <p is a tetrad difference^ : 


(26) 


S13S24 — S14S23 

(811822833814)^ 


<p(P) = P13P24 — PHP 23 . 


Here becomes 


(27) 



P21X1X3 + P13X2X1 — Pi 1X2X3 — P23X1X4 — 


■2 


I) 


and <7- = 0 only when the^ quantity in the brac^kets vanishes with probability 1. 

If in either of the two above cases is substituted for Stj (i.e. if the 

deviations from the sanifile mean, not those from the true mean, are used), the 

result is unaltered. This is true in general, since -- are unaltered at P by 

d^t dUrj 

this substitution. 

There is a well-known 5-m(*thod used in statistics to find limiting variances 
of statistics of the type (jovered by Theorem and 4’'heorcm 1 shows an 
interpretation whi(!h can be given to the results obtained by this method. 

We now iiu'ostigate the necessary modification of Theorem 1 if o* = 0, i.e. if 


(28) 


4 




- X,X;) = 0 


with probability 1 . If we assume that <p has continuous third partial deriva- 
tives in the neighborhood N, we find that 


Exaiuples of th(‘ use of this inothod can lx* found in T. L Kelley, Crossroads in The 
Mind of Man, Stanford University (H)2S), pp 49 50, and in an article by S. Wright, Annals 
of Mathematical Statistics, Vol 5 (1934), p. 211. 
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nfc - .(P)l . ? S S «.«- + 1 S sS: - ») 


( 29 ) 


2 ^z:k 

n v;^ (flip 


9 2-/ ^ Pt;)(SA/ — PA-/) + Rn 

2 .. 71 ./ 


when^ converges stochastically to 0 . The second degree terms constitute 
a quadratic form in Sjk — Pjk\. Now the multivariate distribution of 
{\/?/Ci, V^7i(s^fc — Pjk)]y by the Laplacc-Liapounoff theorem, converges to a 
normal distribution whose variances and correlation coefficients are those of 
X,, XjXk. The distribution of n[<p — (p(P)] thus converges to the distribution 
of the quadratic form 


2 ^ af.af; 2 a^.a^v- 2 ^ ^ ^ as,,asu 


5t; pA-/ y 


where jafr, fijk] have the multivariate distribution just described, unless the 
quadratic form vanishes identically. This reasoning can be continued, the 
general result being that there is some power v of n, if <p is sufficiently regular, 
such that the distribution of — v’(^)] converges to a limiting distribution. 

When (T = 0 in the second example, unless the distribution of Xi, X2, X3, X4 
is confined with probability I to a 4 -dimensional quadric, pn = pi4 = pn == 
P24 = 0 . Equation ( 29 ) becomes 


( 29 ') n\<p <p(-P)] = S13S24 — S14S23 + Rn • 


Now if Xi, X2 are transformed by a linear homogeneous transformation with 
determinant A, it is readily seen that S13S24 — S14S23 is multiplied by A. The 
same is true of X3, X4. If Xi, X2 are transformed into x[, X2 so that E(x]^) = 1, 
E{x[x2) = 0, the determinant of the transformation is ±(1 — Pi2)~‘^* Then 
transforming each pair (xi, X2), (xa, X4) in this way into (x(, Xg), (xg, X4), the 

variables xj, xi, xi, x[ are uncorrelated. If s,, = ^ ^ xi^x],,. 


( 31 ) 


S 1 3 s 2 


S14S23 — “■ 


Sl 3®24 S14S23 


±(1 - p?.,)Kl - pD* 

The limiting distribution of si 3S24 — SI4S23 is the distribution of 5 13524 — ffi 4523 
where these four chance variables are normally distributed, == £^(^24) = 

E(\}u) = E(^'s) = 0, £X 5 :,) = E{x:x]),E{^[^^U) = E{x\x\x'kx\y Now if 
Xi, X2, X3, X4 are normally distributed — the most important case for statistical 
purposes — xl, Xj, x.,, X4 will also be distributed normally, and the vanishing of 
the (‘orrelation coefficients means that the chanc(‘ ^'ariables are independent. 
If this is true 


( 32 ) 


E{!^[]) = 1 


£^( 5 .,M = 0 , 


( 5 t; ^ 
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Evidently, however, xj, X2, x^, X4 do not have to be independent to make tlicse 
equations valid. It is more than sufficient if the pairs (xi, X2), (X3, Xi) and there- 
fore the pairs (xj, Xg), (xg, x[) are independent. If (32) is true, the jS^s are in- 
dependent, each one being normally distributed with mean 0 and variance 1. 
Summarizing these results, and using Lemma 3 : if tp is the tetrad dijferenec and 
if pi3 = pi4 = P23 = P24 = 0, the distribution of 7 il(p — ^(P)] converges to a limiting 
distribution. If in additioii the distribution of Xi, X2, Xa, X4 is normal^ or if the 
pairs (xi, X2) (X3, X4) are independent, this limiting distribution has density function 

2 

where c = (1 — pjg)"^ (1 — pD~^- 

Wilks has investigated the case wiiere Xi, X2, X3, X4 are normally and inde- 
pendently distributed, and in this case found the exact variance of the tetrad 
difference as a function of 71. 

Columbia Univehsity 


Proceeding's of the National Academy of Sciences, Vol 18, (19112), pp 562-505. 




ON THE POSTULATE OF THE ARITHMETIC MEAN 

By Richmond T. Zoch 

Introduction 

Suppose n observations have been made of an unknown quantity. It is de- 
sired to know the most probable value of the unknown. When Gauss gave his 
development of the so-called Normal Law of Error, he assumed that the Arithmetic 
Mean of the n observations is the most probable value. The question arises: Can 
this postulate be justified? 

In the excellent book, entitled ‘^Calculus of Observations,*^ by Whittaker and 
Robinson^ there is given a proof which purports to deduce the postulate of the 
Arithmetic Mean from assumptions of a more elementary nature. This proof 
is not correct. 

Since this book has had wide circulation, it is believed that the errors in this 
proof should })e called to the attention of the users of the book. The present 
paper has been prepared for this purpose. The first part of this paper points 
out the questionable features of the proof given in Whittaker and Robinson's 
book. The sc'cond part gives some critical comments on the original sources 
from which Whittaker and Robinson obtained their proof. 

Part 1 

The assumptions on which Whittaker and Robinson based their proof of the 
postulate of the Arithmetic Mean are : 

Axiom I. The differences between the most probable value and the indi- 
vidual measures do not depend on tlic position of the null-point from which 
they are reckoned. 

Axiom II. The ratio of the most probable value to any individual measure 
does not depend on th(i unit in terms of which the measures are reckoned. 

Axiom III. The most probable value is independent of the order in which the 
measurements are made, and so is a symmetric function of the measures. 

Axiom IV. ''Fhe most probable value, regarded as a function of the individual 
measures, has one-valued and continuous first derivatives with respect to them. 

It is fairly easy to show that if the Arithmetic Mean is the most probable 
value, then the above four axioms follow as conclusions. The converse, viz. if 
the above four axioms be assumed then the Arithmetic Mean is the most prob- 
able value, however, is not true. That is to say the above assumptions are 

^ The C 4 IC 11 IUS of Observations by E. T. Whittaker and G. Robinson, Blackie & Son, Ltd., 
London (1929), pp. 21.5-217. 
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necessary conditions, but not sufficient conditions. For, consider the following 
function of the measures : 


M2 


^ (a:. - *)* 

n i-i 

1 »— n 

- S 

n pl 


where x is the Arithmetic Mean of the x^. 

Clearly this function is a symmetric function of the measures (a:,) and there- 
fore satisfies Axiom III. If the Xi are each multiplied by k then the Arithmetic 
Mean {x) is also multiplied by k and we have 

^ ^ (kxt — kxy 
^ = fc ; 

n » - 1 


that is to say, if we multiply the individual measures by k it is the same as multi- 
plying the function — by A; and therefore the ratio of any individual measure 

M2 

to the most probable value (function) does not depend on the unit used. Hence 
the function — satisfies Axiom II. 

M2 

The partial derivative of — with respect to xi is 

M2 




"j" 3(ari — x)’^ 


dXi 

dxi 


- {s <*- - *>■} _2{s <»■ - ■'>}{- §} + - *) g) 



= 


— xY — /i2] — 2iia{xi — x) 


„ 2 
nH2 


since = -, and XI — :r) = 0. The partial derivatives of — with respect 
dxi n x~i /i2 

to each of the Xi are of the same literal form and clearly these partial derivatives 

are single valued and continuous. Therefore the function ~ satisfies Axiom IV. 

M2 

Now it can be shown that if h be added to each a;*, then the function ~ is 

M2 

unchanged and hence this function docs not satisfy Axiom I. (It should be 

noted that the function — is invariant under the transformation specified by 

M2 
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Axiom I.) However, consider the function ^ + a — s /, where a is a constant 

M2 

independent of the Xi, Clearly, / satisfies all of the four axioms. 

Thus a function, distinct from the Arithmetic Mean, has here been exhibited 
which satisfies the four axioms given in Whittaker and Robinson^s book. Hence, 
these four axioms are not sufficient to establish the postulate of the Arithmetic 
Mean. The question arises: Where is the proof given by Whittaker and Robin- 
son lacking in rigor? The proof given is essentially as follows. (No part of the 
proof given by Whittaker and Robinson is here omitted; in fact, for the sake of 
rigor and careful reasoning, further explanations are given and the various steps 
are numbered.) 

(1) Suppose the most probable value is expressed in terms of the n measures 
xi, X 2 y • • • }Xn by the function <t>(xiy X 2 , • • • , Xn ) ; that is to say the most probable 
value is some function, <^, of the observations, or: the most probable value 

= <f}(^X\y X2i * ‘ ■ j ^n)« 

(2) By the theorem of the mean value in the differential calculus, which by 
Axiom IV is applicable, we have <l>(kxi, kx 2 , • * • , kxn) = 

0, . . . , 0) + fcxi + . . . + fcx„ , 


where the square brackets denote that every ar* is to be replaced by Okxi where d 
lies between 0 and 1 . 

(3) By Axiom II, the left hand side = k<l>{xij X 2 t * , Xn). 

(4) By the continuity of 0, postulated in Axiom IV the equation 
(l>{kxif kx 2 , • • • , kxn) = k<i>{x\y 0 : 2 , • • • , a:„) must hold in the limit when k is 0, 
that is 0(0, 0, • • • , 0) = 0. 

(5) We now have 

Xi, , x„) = kxi + • • • + 

or on dividing by k, 



0U1, X 2 y • - y Xn) = Xl 




(6) In this last equation let A: — > 0: then each of the quantities 


— tends 


to a value which is independent of the x’s and we can write 0(a:i, 0 : 2 , • • * , Xn) = 
CiXi + • • • + CnXn where the c^s are independent of the a:^s. 

(7) By Axiom III the c^s must all be equal, so 


<l>{Xiy X2y • - , Xn) = c(Xi + 0:2 + • • • + Xn)- 


(8) From Axiom I we have 

4>{Xl + hyX2 + hy • • • yXn + h) ^ (t>(Xly ^ 2 , * * * , Xn) 4 " k. 
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(9) If in this last equation we let the Xi all approach zero then we have cnh = h 

and therefore c == ~ and finally 
n 


<t>{xu X 2 y • • • ,Xn) = ^(Xi + X 2 + • • • + Xn) 
n 

which states that <1} ^ the most probable value = the Arithmetic Mean. 

It should be noted that the first six steps involve only Axioms II and IV. Of 
these first six steps the second and sixth are questionable. 

The sixth step involves the tacit assumption that the partial derivatives are 
functions of k. These partial derivatives are not necessarily functions of k and 

the example given above, viz, / = .r + a— isa function whose partial deriva- 

M2 

tives are independent of A;; in fact no function of the form 

» -»n 

, .00 2 

F = X + ^ Of 

{x, - x)>-^ 

will satisfy the tacit assumption involved in the sixth step; nor is F the most gen- 
eral function which will not satisfy the tacit assumption, thus take for example 


P ^x + 




+ CM3 


Consider now the second step. Take the function </)( 2 /i, 2 / 2 , • • • , 2 /n) = 
k<i>{x\, X 2 , • • • , Xn). Then, by Axiom 11, we have yt = kxx. Apply the Theorem 
of the Mean Value to <l>(y%) instead of 0(a:,). Then <#>(?/!, 2 / 2 , • • • , 2/») = 


(0, 0, • • • ,0) + 2/1 




+ 


we obtain the equation given in the second step except that the square brackets 
r d<t>{kx\j kx2i * * • > 


arc now of the form 


L' 


r 

H I iz " I* 
epe 


Now if we replace ?/* by kxi 


and not 


d(kxt) 

Whittaker and Robinson. It is difficult to decide whether by 
and Robinson mean 

d<f>(Xi, 3*2, • • • , Xn) 




as given by 


-9x*^ 


Whittaker 


d<t>{kxij kx2, • • • , kxn) c 

9x, J _ 


0X» 


These last two expressions are not equal. To make the second step more clear 
it is necessary to demonstrate that 


d(t>(Jkxiy kx2) • • 
a(fcx,) 


A;Xn)"| _ f 9</>(Xi, X2, • • • , Xn) 

- ~ L dXi ~ J' 
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and this has not been done. In order to demonstrate this equality further use 
must be made of Axiom II. It appears that the questionable features of the 
second step may be overcome by starting with the equation implied by Axiom 
II, thus 


<t>{kx\y kX 2 y • • • , kXn) = k<t>(xiy Xj, • • • , Xn) ; 


in other words 0 is a homogeneous function of degree 1. Therefore use can be 
made of Euler ^s Theorem on homogeneous forms. In this way we obtain : 


t =* n 



which is an abbreviation of the last equation given in the fifth step. 
Now, making further use of Axiom II we have: 


d4>{kxi, kx2y • • • , kxn) 
d{kxt) 


d(kx^) 


1 d 

bpixi, Xt, ■■■ , Xn) = , ^n) . 


It follows that 


d<p(xi, X2, • • • , Xn) _ dip (kxi, kX2y • • y kXn) 
dXi d{kxi) 


From this development we conclude that for any function whatever which satis- 
fies Axiom II the last equation of the fifth step cannot possibly involve k. 

In order to overcome the defect in the sixth step it is necessary to make a more 
restrictive assumption. If in place of Axiom IV, wo assume that **The most 
probable value y regarded as a function of the individual measures y has first partial 
derivatives with respect to them which are constant y^^ then the equation given in the 
sixth step can be rigorously established. 

After the equation of the sixth step is rigorously established there remains an 
objection in the seventh step. The axioms do not explicitly state that the n 
observations must be functionally independent. Therefore suppose the Xt are 
functionally dependent according to the relation Xy = ijtZ where the ?/* are all 

constant. Then the function / = x + ~ will hav(i partial derivatives with 

M2 

respect to the Xx which are unequal and constant; yet at the same time the 
function / is a symmetrical expression of the n variables. 

Hence in order to establish the postulate of the Arithmetic Mean along the 
lines followed by Whittaker and Robinson it is necessary to make another restric- 
tive assumption slightly different from that proposed in the last paragraph but 
one, and assume (in addition to Axioms I and II) that the function has partial 
derivatives with respect to the x, which are equal. 
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Part 2 

The first original paper consulted was one by Schiaparelli.^ In this paper nine 
propositions are presented four of which are also called lemmas. From a strict 
mathematical point of view the four propositions which Schiaparelli calls lemmas 
are really postulates. Schiaparelli discusses these four lemmas at length; three 
of these lemmas are the first three axioms given in Whittaker and Robinson’s 
book. The fourth one is: “When, in the function <t>, all the variables (xi) take 
the same value a, the function itself becomes equal to a” (This, as a matter of 
fact, is the definition of an average). 

In his discussion of these lemmas, which are based partly on practical and 
partly on philosophical grounds, Schiaparelli points out that they are justified 
from the practical or statistical nature of the problem involved in arriving at the 
most probable value (Schiaparelli uses the term “true value”) of a set of obser- 
vations. In the present writer’s opinion, these discussions are the most excel- 
lent part of Schiaparelli’s paper. These discussions are even more significant in 
view of the fact that the later writers on this subject make no attempt whatso- 
ever to justify the use of their postulates. 

Schiaparelli remarks that we should have no reason for not expecting that a 
small change in a single observation should produce a small change in the func- 
tion 0; but he does not make this remark in the form of an explicit postulate. 
This could have been done and, moreover, such a postulate of continuity could 
be justified from the practical nature of the problem. It seems that a more 
elegant procedure would have been to deduce the continuity of the function and 
its derivatives from Axioms I and II. It will be shown later that this is possible. 
From his remark on the continuity of the function, Schiaparelli concludes that 
the partial derivatives of <!> with respect to the exist and are continuous. His 
method of arriving at this conclusion is not valid, for it is well known that an 
arbitrarily assumed function may be everywhere continuous and yet possess a 
derivative at no point. 

Schiaparelli’s Proposition III states: “When in the function all the Xi take 
the same value, then the — become equal to each other.” This Proposition is 

dXi 

false. To show this, consider the function 

Ma 

where the 


^ _ 1 3/12 [(x< —xy — ^ 2 ] — 2ni{xi — x) 

ax, “ n nul 

* Giovanni Schiaparelli — Come si possa giustificare I’uso della media aritmetica nel cal- 
colo die risultati d’osservazione, Rendiconti Reale Institute Lombardo di Scienze e lettere, 
Vol. XL (1907), pp 752-764. 
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Now, when the all approach a then both / and — become indeterminate 

dXi 

forms. However, in this case / takes an indeterminate form which can be 

evaluated and it can be shown that — will always have the value zero, i.e., / 

M2 

will have the value a when all the Xx = a; while the — can take any value 

dXi 

whatever and in general the — will not be equal when the Xi-^ a. To illus- 

dXi 

trate: Consider the observations yi — 1, t /2 = 3, 2/3 = 4 then y = 8/3 and 
M 2 = 14/9 and ms = —20/27 whence / = 8/3-10/21. Now assume that these 
three observations all approach 2 in a certain way, i.e., let = 2 {y^ — 2)z, 
Then x 2 {y — 2)z = 2 + (2/3)z. 


^ S (I/. - yf = (14/9)3* 

n 

and 

M3(x,) = 2 * - 2 ((/, - yY = (-20/27)2* 
n 

whence / = 2 + (2/3)2 — (10/21)2. Clearly as 2 — > 0 the a;* — > 2 and / — > 2. 
However, 

= 1 - 4 - 

2-h(vi— 2)2! 3 294 

== 1 _ 

()X2j3rj— 24'(vi— 2)* 3 294 

3 x 3 jxi— i2-f-(i/,— 2)z 3 294 


Thus the — are not functions of 2 and as the x, — > 2 the — remain constant 

ax, ax, 

and unequal. 

From his conclusion that the derivatives of exist and from Axiom I, Schia- 

parelli obtains the equation, = 1> (1^^® equation being his Proposition 

t-i ax, 

V) in the following way: Since the derivatives of 0 exist, then by the Theorem 
of the mean value. 


0 (Xl -j- /l, X2 -j- /l, X3 -f- 1 " 1 “ h) 

= ^*. • • • . ^») + * (^^ + ^^ + ••• + ^). (A) 

By Axiom I: 

<^(Xl *-}- hf X 2 hf • • • , Xn “b — 0(^l> ^2; * * • > ^n) d" ^ • 
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Whence — = 1* Now this equation is correct but the above proof of it 

t«l OXi 

is not convincing. Clearly, according to the Theorem of the mean value, in 

fiih 

equation (A) it is necessary to replace each Xi in the — by 6xi where B is 

dXi 

between 0 and 1 . 


Schiaparelli’s Proposition VH states in effect that the — are invariant under 

dXi 

the transformation x[ = Xt + A where h is constant, and his Proposition IX 
states that the — are invariant under the transformation x[ = kXx where k is 

dXi 

a constant. These two propositions are correct and are correctly established. 
Making use of his Propositions III (which is false), V, VH and IX, Schiaparelli 
proceeds to the establishment of the postulate of the Arithmetic Mean, as 
follows: 

Let a = <#»(rCt). As the x^ vary, then a varies but for a particular set of x, 
then a is a constant. Now by Axiom I we have 

a + (m — 1 ) a = <t)(xi + (m — l)a, X 2 + (ni — l)a, • • • ^ Xn+ (m — l)a) m a 


for all values of m > 1. Then by Axiom II: 

^X\ + (m — 1) a X2 + (m — 1) a Xn + (m — 1) 


a = 


m 


m 


m 


-) 


/xi — a X 2 - a ~ ^ I 

= 0 f ^ a, a ^ V a] . 

\ m m m / 


And by Propositions VII and IX, the — are unchanged during the above trans- 

dXi 

formations. Hence the last equation is true when 7?i — > co and by Proposition 
d<b 1 

III (false) the — = - as when m — > oo, <t>(Xt) = a. In this final proof Schia- 
dXi n 

parelli gives a geometric illustration of each stej). 

It is both interesting and strange to know that in (^losing his paf)er Schia- 
parelli does not claim that the Arithmetic Mean is the only function which 
will satisfy all of his postulates. In fact he himself points out that the func- 

t <= » 

tion <l>y implicitly defined by the equation X) (<t> — = i) where m is an 

t *= I 

odd integer > 1 will satisfy all of his postulates. Furthermore he points out 
that this function will not satisfy his Proposition HI. Schiaparelli’s object 
was to establish the postulate of the Arithmetic Mean without any ap[)eal to 
the concept of probability. To accomplish this he made four assumptions each 
of which he justified by a priori reasoning. Then he proceeded with the above 
proof. Why he should have been satisfied with his own proof after perceiving 

the function defined by (</> — x,)”* = 0 is hard to understand. 
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The second paper® consulted was also by Schiaparelli. It is merely an 
abridged form of the one just discussed. Schiaparelli wrote two earlier papers on 
this same subject (altogether Schiaparelli wrote four papers on it) but it was 
inferred from the footnotes in his paper, which has just been discussed at length, 
that it contained all of the material of the tw^o earlier papers with which he him- 
self was satisfied. Therefore Schiaparelli^s two earlier papers were not con- 
sulted. 

The third pai)cr consulted was that by Broggi.'* Broggi states that the pur- 
pose of his paper is to establish the postulate of the Arithmetic Mean by purely 
analytic methods which are more brief than Schiaparelli ^s method. Broggi 
words the assumptions upon which he bases his proof as follows: 

1 . 0 is a symmetric function of its n variables ; 

2 . The partial derivatives are single-valued and finite ; 

3 . We have kx2, • • • , kxn) = k<t>(xi, a*2, • • • , Xn ) ; 

4 . We have <l>(xi + A, 2:2 + A, • • • , ar„ + /?) = 0 (ji, 0*2, • • • , Xn) + that is 
to say for 2 : 


dXi dX2 


+ = 1 

dXn 


(a) 


Broggi does not explain why he used the iK)stulatc 2 but presumably it was in 


order to exclude the function defined by ^ ((t> — = 0 . Consider the 

special case where m = 3 . Then — 3 <^® Sx, + 3 <^> ^xl — SxJ = 0 . Let 

p = ^ ~ xM and q = Xx^ — 2 x® — - SxJ. Also put R = 

\n / n n 

(p/Sy + (7/2)“ and let A be the real cube root of — q /2 + 's/ R and B be the 

real cube root of — q /2 — y/ R, Then the three branches of </> can be explicitly 

written 


</>! = A -|- B -|“ X 
<f>2 = OoA 0)‘'B ~|- X 

03 = 1>3*‘A ~j- (jjB X 

where w and w’ are the two complex cube roots of unity. Now while 0 does not 
satisfy the postulate that the function be singles valued, 0i satisfies this postulate 
as well as all the others and so does 02 and also 03. Hence, Broggi’s failure to 

t — n 

comment at length on the function ^ (0 — x*)”* = 0 is unsatisfying. As a 

t — 1 

matter of fact Broggi fails to point out any of the defects of Schiaparelli's 

* Giovanni Schiaparelli — Come si possa giustificare I’uso della media aritmetica nel 
calcolo delle misiire, senza fare aleuna ipotesi sulla legge di probabilitii degli errori acci- 
dentali, Astronomische Nachrichten, Band 176 (1907) pp. 206-212. 

* Ugo Broggi — Sur Le Principe De La Moyenne Arithmetique, L’Enseignement Mathe- 
matique, XI (1909) pp. 14-17. 
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paper, with the possible exception that he shows Schiaparelli’s postulate which 
states <f> =: a when each of the Xt = a to be a consequence of Axioms I and II. 
This is done so casually that it makes one wonder whether Broggi really was 
aware of the fact that Schiaparelli’s postulates are not independent. 

Broggi proves the Lemma: “A homogeneous function of the first degree which 
is a solution of the equation of partial derivatives (a) is an integral function.” 
This Lemma is correct and is correctly proved but its wording is apt to be mis- 
leading; in fact it appears that its tme meaning was not clear to Broggi himself. 

For, while the function 0 cannot be of the form — where ^ is a homogeneous 

X 

function of the degree w^hich satisfies Axiom I and x a homogeneous func- 
tion of the (p — 1)^^' degree which also satisfies Axiom I, the Lemma does not 

mean and Broggi has not proved that <t> cannot be of the form </> = + - where 

X 

12 is an integral function satisfying Axioms I and II and and x are homogene- 
ous functions of the and (p — 1)^^' degrees respectively which are invariant 
under the transformation specified in Axiom I. By reason of this oversight, 
Broggi concludes that any function satisfying Axioms I and II must be linear 
in its n variables, a conclusion which is erroneous. 

The fourth paper consulted was that by Schimmack.^ Schimmack’s paper is 
in three sections. The first section contains the proof which is essentially that 
which Whittaker and Robinson give. In thv second section Schimmack gives a 
different proof, from a set of new j)Ostulates. The new set of postulates is: 

Axiom I' = Axiom I. 

Axiom II' — The most probable value is indiipendent of the sense of direction 
of the scale upon which the observed values (and the most probable value) are 
re(;koned, that is to say, 

-X2j • • • , —Xn) = ^2, • • • , Xn). 

Axiom III' = Axiom III. 

Axiom IV' — If from n observed values, the most probable value be computed 
and if one obtains an additional observed value then the most probable value of 
the n + 1 observed values is the same as the most probable value of n + 1 
quantities consisting of the initial most probable value counted n times and the 
(n + 1)*^' observed value, namely: 

* * • > = 0n-4-l(0n> ' * * > 

In explaining the object of this second section, Schimmack says that postulat- 
ing the existence of the derivatives (Axiom IV) seems unjustified and ought to 
be avoided and only such axioms made which the intrinsic character of the prob- 
lem justifies. In connection with this statement of Schimmack’s it appears that 
the intrinsic character of the problem certainly does not justify Axiom IV'. In 

® Rudolf Schimmack — Der Satz vom arithmetischen Mittel in axiomatischer Begriin- 
dung, Mathematische Annaleii, Band 68 (1909) pp. 125-132, 304. 
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fact, Axiom IV' appears to be quite artificial. Moreover, Schimmack does not 
attempt to justify Axiom IV' by a priori reasoning as Schiaparelli docs for 
Axioms I, II, and III. While, if the Arithmetic Mean is the most probable 
value, Axiom IV' follows, since it is a property of the Arithmetic Mean, it does 
not seem to be in keeping with the intrinsic character of the problem to use this 
property as a starting point for later deductions. 

As regards Schimmack^s objections to Axiom IV, all of the conditions specified 
by it can be deduced from the first two Axioms except that the derivatives must 
be single-valued. To show that this is true, consider an arbitrary function 
which satisfies Axioms I and II. Let this function be 2 : 2 , • • • , Xn). We do 
not know that <#> is continuous or that <t> has any derivatives. All we assume is 
that 0 satisfies the first three Axioms and it is here proven that <t> must be con- 
tinuous and have continuous partial derivatives. By Axiom I we can give 
increments to the Xx] hence we give each x^ the same increment, Aa:, and then 
subtract 0 and we have: 0(a:i + Aa*, x^ + Ax, • • • , x„ + Ax) — 0(xi, X 2 , • • • , Xn) = 

A0 but by Axiom I, A0 = Ax. Therefore ^ = 1 = In other words, the 

Ax ax 

total derivative of 0 exists and is constant. Therefore the total derivative of 
0 is continuous. But since the total derivative exists, all of the partial deriva- 
tives exist. By Axiom II, 0 is a homogeneous function of the first degree. 

Applying Euler\s Theorem for homogeneous forms, we have 0 = xi h ^2 — 

0 X 1 dXj 

rifh 

+ • • • -fXn — . Since the total derivative of 0 is everywhere continuous, 
dXn 

0 is also everywhere continuous. Thus, the right hand side of the above equa- 
tion is everywhere continuous and each partial derivative is therefore everywhere 
continuous. 

As regards that part of Axiom IV which requires the to be single valued, 

0X1 

it would seem more satisfactory to postulate that the function 0 is single-valued, 
for the single-valuedness of a derivative does not insure the single-valuedness 
of the integral while the single-valuedness of a function docs insure the single- 
valuedness of the derivative where the derivative exists. 

In the third section of his paper, Schimmack shows Axioms I, II, IK, and IV 
to be independent, and likewise Axioms T, II', III and IV'. 

Schimmack docs not mention any of the questionable features of Schiaparelli's 
and Broggi's papers. 

The fifth paper consulted was that by Suto.® Suto's assumptions are: 

1®. 0(x, X, • • • , x) = X (This is Schiaparelli's). 

2®. 0(xi + 2 / 1 , X2 + y2y • - y Xn + Vn) - <t> (a:i, ^ 2 , • • • , xj depends on the 
values of 2 / 1 , 2 / 2 , • * • , 2/n only. 

3°. = Axiom III = Axiom III'. 

• Onosaburo Suto — Law of the Arithmetical Mean, Tohoku Mathematical Journal, Vol. 
6 (1914) pp. 79-81. 
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Suto says he believes these assumptions to be more simple and natural than 
Schimmack^s Axioms However, assumption 2® appears to be quite 

artificial and very restrictive. Suto does not even attempt to justify it by a 
priori reasoning. 

Suto shows his three Axioms to be independent. It is interesting to know that 
Suto has established the postulate of the Arithmetic Mean rigorously using only 
three postulates while Schiaparelli, Broggi and Schimmack failed using four 
postulates. In this connection it should be observed that when Axiom IV as 
given by Whittaker and Robinson is replaced by ^^The most probable value, 
regarded as a function of the individual measures, has first partial derivatives 
with respect to them which are equaF’ as suggested at the end of Part 1, then 
Axiom III can be deduced as a consequence of Axioms I, II and the reworded 
Axiom IV, so that three Axioms only are sufficient to deduce the postulate of the 
Arithmetic Mean. However, it would be difficult to justify the reworded Axiom 
IV from the nature of this problem of the Arithmetic Mean. 

Suto does not point out any of the defects of the preceding papers. 

The last paper consulted was that by Beetle."^ It deals with the third section 
of Schimmack’s paper. Beetle also fails to point out any of the defects of the 
preceding papers. 

Conclusion 

The postulate of the Arithmetic Mean can be rigorously established, without 
the use of the concept of probability, if sufficiently restrictive assumptions are 
made. The writers making sufficiently restrictive assumptions have failed to 
justify the use of them. Several proofs of the postulate of the Arithmetic 
Mean are clearly erroneous. The existing attempts to establish the postulate of 
the Arithmetic Mean witliout any appeal to the concept of probability are, 
therefore, unsatisfactory. 
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THE SHRINKAGE OF THE BROWN-SPEARMAN PROPHECY 

FORMULA 

By Robert J. Wherry 


At the recent meeting of the Conference on Individual Psychological Differ- 
ences held in Washington, Dr. Clark Hull of Yale University called attention to 
the fact that the much used Brown-Spearman formula involves, or leads to, if 
used without regard to certain limitations, a certain over optimism.^ In other 
words, if only this formula is taken into account, one would assume that the mere 
increasing in length of a test would automatically and, with continued increases 
in length, indefinitely continue to increase its reliability or validity. 

On the other hand, we know that the greater the number of test units the 
greater the shrinkage between the predicted and actually obtained value. At 
least we know this to be true when the value in question is a multiple correlation 
coefficient and the test units are independent variables. Hull raised the question 
as to whether or not the same fact might be true of the figures predicted by the 
Brown-Spearman formula. It is the purpose of this article to show that this 
shrinkage does occur, and that the Wherry-Smith shrinkage formula^ satisfac- 
torily predicts this shrinkage. 

A quick review of the nature of the two formulae (the Brown-Spearman and 
the Wherry-Smith formulae) will at once show the importance of the discussion. 
The Brown-Spearman formula, as applied to the predicting of reliability, reads 
as follows. 


R = 


Mrn 

1 + (ilf ~ T) m ' 


( 1 ) 


where R = the predicted reliability, 
m = the discovered reliability, 

and M = the number of times the test is lengthened. Thus the test provides 
that the predicted reliability (R) will increase with each increase in M, but it is 
to be noted that the increase in R decreases with each increase in M as the value 
of R approaches its limit of plus one. 

On the other hand the Wherry-Smith formula, which reads. 


^2 = 


(N ~ 1)^2 (M - 1) 
N - M 


( 2 ) 


where R = 
R = 
M = 


the predicted value of the correlation, 

the discovered correlation, 

the number of independent variables 
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and N = the statistical population (the number of cases), provides that, for 
each increase in M, the shrinkage in ^ as compared with R increases. Thus, if 


TABLE I 

Correlations Observed and Theoretical {Based upon Observed Means) 


(N = 37 throughout) 


Observed 

Correlation predicted 

Error 

average 

Br.-Sp. 

Wherry 

Br.-Sp. 

Wherry 


(Trait 1) 


1 

5 

.290 

.728 


.618 

-.057 

-.110 

10 

.717 


.726 

.086 

.009 

15 

.754 


.758 

.106 

.004 

20 

.805 

.891 

.825 

.087 

.020 

30 

.936 

.925 

.509 

-.011 

-.427 


(Trait 5) 


1 

5 

.419 

.736 

.783 

.751 

.047 


10 

.845 

.878 

.834 

.033 


15 

.887 

.915 

.856 

.028 


20 

.877 

.935 

.856 

.058 


30 

.876 

.956 

.745 

.080 



(Trait 10) 


1 

5 

10 

15 

20 

30 


.354 

.479 



.254 

.213 

.717 



.129 

.071 

.852 



.040 

-.036 

.636 



.279 

.186 

.805 



.138 

-.150 


1 

1 1 

(All Traits) 


1 1 

1 1 



.320 








822 



■■ 



.576 

■■ 

-.296 


we assume that the Af’s in the two formulae arc analogous, i.e., if we assume the 
Wherry-Smith formula to be applicable to the Brown-Spearman formula, we 
see that as M increases the Brown-Spearman formula adds a decreasing incre- 
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ment while the Wherry-Smith formula provides that an increasing decrement be 
subtracted, thus eventually we arrive at a point where by further increasing the 
length of the test we will decrease rather than increase the size of the reliability 
coefficient. 

If our hypothesis be true, we must, then, in order to predict the correct value 
of R, substitute the value of equation (1) in equation (2). Doing this we have 

^ - (M - - 2(M - 1)\, - (M - 1) . . 

{N - M) [1 + 2iM - l)r,, + (M - lyrU 

which would then be the form in which the Brown-Spearman formula should be 
used in predicting reliability corrected for chance error by the Whi'rry-Smith 

TABLE II 


Ei'ror in Predicting Reliability (Based upon Observed Mcaiis) 


Error 

Brown-Spearman 

Wherry 

over 

.210 

2 

1 

.151- 

.210 


1 

.091- 

.150 

3 


.031- 

.090 

8 

1 

-.029- 

.030 

3 

6 

- .089- - 

.030 

1 

3 

- . 149- - 

.090 


3 

-.20^ - 

.150 



below — 

.209 


2 


TABLE III 


Rietz Criteria of Normality Applied to Results from Means 


Criterion 

Normal 

Brown-Spearman 

Wherry 

Ui 

0 

.074 

-.032 

/3i 

0 

.561 

-.283 


3 

2.008 

3.180 


formula. The same result can of course be secured by applying the formulae 
consecutively. 

In order to test the formula (3), the writer has applied it to some empirical 
data. A recent article by H. H. Remmers of Purdue University furnishes the 
needed data. Remmers study dealt with the increase in reliability due to in- 
crease in the number of judgments of certain traits of college professors.^ His 
results, together with the results of applying formula (3) to the data am shown 
in Table I. 

An inspection of Table I shows at once that while the Brown-Spearman 
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formula gives results which are consistently too large (15 out of 17 times) the 
Wherry-Smith formula gives results which are more nearly equally distributed 

TABLE IV 

Correlatio7is Observed and Theoretical (Based upon Observed Medians) 

(N = 37 throughout) 


Observed 

Correlation predicted 

Error 

medians 

Br.-Sp. 

Wherry 

Br.-Sp. 

Wherry 


(Trait 1) 


1 

5 

.344 

.752 

.724 

.682 

-.028 

-.070 

10 

.663 

.840 

.779 

.177 

.116 

15 

.702 

.887 

.807 

.185 

.105 

20 

.805 

.913 

.805 

.108 

.000 

30 

.936 

.940 

.635 

.004 

-.301 


(Trait 5) 


1 

5 

.450 

.760 

.804 

.776 

.040 

.016 

10 

.856 

.891 

.852 

.035 

-.004 

15 

.931 

.925 

.873 

-.006 

-.058 

20 

.877 

.942 

.874 

.065 

-.003 

30 

.876 

.961 

.778 

.085 

-.098 


(Trait 10) 


1 

.363 





5 

.433 

.740 

.701 

.307 

.268 

10 

.754 

.851 

.795 

.097 

.041 

15 

.872 

.895 

.822 

.023 

-.050 

20 

.898 

.919 

.820 

.021 

-.078 

30 

.872 

.945 

.669 

.073 

-.203 


(All Traits) 


1 

.503 

.953 ^ 


.055 ^ 


20 

.898 


.879 


-.019 

30 

.872 

.968 

.829 

.986 

1 

-.043 


between positive and negative errors (7 to 10), tending to slightly underestimate. 
The actual distribution of errors can be more easily seen by an inspection of 


Table TI. 
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Now, if our formula were perfectly correct, we should expect that the errors 
incurred by its use would be normally distributed about a mean error of zero. 
The Rietz criteria for normality of distribution were applied to these errors with 
results as shown in Table III.^ It can be readily seen that the Wherry correc- 
tion formula gave much better results than did the imcorrected Brown-Spearman 
formula when measured by the Rietz criteria. 

All of the results in the first three tables are based upon the means of the 
results obtained by Remmers, since this was the method used in his paper. 
However, when the number of cases is small, as they were in this study, it is 

TABLE V 


Error in Predicting Reliability {Based upon Observed Medians) 


Error 

Brown-Spearman 

Wherry 

over . 210 

1 

1 

.151- .210 

2 


.091- .150 

3 

2 

.031- .090 

5 

1 

-.029- .030 

6 

5 

-.089- - .030 


5 

-.149- - .090 


1 

-.209- -.150 


1 

below — . 209 

1 

1 


TABLE VI 


Rietz Criteria of Normality Applied to Results from Medians 


Criterion 

Normal 

Brown-Spearman 

Wherry 

Ui 

0 

.074 

1 

o 

QO 1 

ft 

0 

.497 

1 

b 

00 

ft 

3 

1.699 

2.284 


sometimes preferable to use the median rather than the mean as a basis of calcu- 
lation, since th(j median is less affected by extreme cases. The writer has there- 
fore recalculated the problem on the basis of the medians discovered by Rem- 
mers, and the results are given in Tables IV, V, and VI. The results were found 
to differ but little from those based upon the moans of the distributions. 

If we now assume that the formula (3) has been empirically established and 
justified, we must still answer a very practical question, namely, ^'How long 
shall we make our tests in order to achieve the greatest reliability?’^ To answer 
this question we must find the point at which R becomes a maximum, with 
respect to changes in M, assuming rn and N to be constant terms. To find this , 
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point we must find the derivative of equation (3) with respect to Af and set the 
numerator equal to zero, thus, if we write Formula (3) in a slightly more usable 
form, we have, 

^2 {N - M-\ , . 

(iV - ilf)(l + 2[M - l]r„ + [M - 1]VL) N -M’ 

whence 

^ ^ (1 + [M ~ ~ m r\, + 3r„[l - r J)M + (1 ~ r^} 

dM {N - - K) 

which causes R to rcacli a maximum or minimum when the numerator is placed 

TABLE VII 

Showing the value of M which will give a maximum value for R 
(According to the Brown-Spearn)an-Wherry-Smith formula) 


rn 


\0 

.40 

.50 

.60 

.70 

.80 

.90 

3 

4 

4 

4 

5 

5 

5 

8 

9 

9 

10 

10 

10 

10 

8 

14 

14 

14 

15 

15 

15 

8 

19 

19 

19 

20 

20 

20 

3 

24 

24 

24 

25 

25 

25 

8 

29 

29 

29 

30 

30 

30 

3 

34 

34 

34 

35 

.35 

35 

8 

39 

39 

39 

40 

40 

40 

3 

44 

44 

44 

45 

45 

45 

8 

49 

49 ^ 

49 

50 

50 

50 


equal to zero. Thus, placing the numerator equal to zero and factoring this 
equation, we find its roots to be 




•d - rji) 


(5a) 


or 


M = 


M = 


2iSfr,i - 3(1 - rjj) - - 12A'’r„(l - rn) - 7(1 - rj* 


8r„ 


2Nr,, - 3(1 - r,,) + ViN^r], -12Nr,^{l- r„) - 7(1 - r,,)* 


8r„ 


(5b) 


(5c) 


and by substituting actual values of N and ru in the equations, we find that 
equation (5c) is the root we are seeking (i.e.) the value of Af for which R be- 
comes a maximum. 
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It can also be readily seen that the value under the radical approximates a 
perfect square (lacking 16 units of being that figure) of the quantity outside of 
the radical, thus approximating this value for large values of N. Thus, when N 
is large (exceeds 100) we may secure satisfactory approximations to M if we 
rewrite equation (5c) in the form below 


M 


_N 

(Approximately) ~ 


3(1 - 

4rn 


(5d) 


Table VII shows the results of equation (5c) for values between iV = 10 and 
N = 100 (by increments of 10) for values of Vn from .10 to .90 (by increments of 
.10). The use of the formula does not yield integers, and so the results in the 
table are recorded to the nearest whole number rather than exactly as given by 
the formula. 

If, in order to test the validity of formula (5c), we apply it to the values in 
Tables I and IV, we find fairly close agreement. The formula in each case pre- 
dicts a maximum value for R when M lies between 15 and 20, and in the actually 
lengthened tests R is found to be a maximum when M is 30, 15, 15, 20, 30, 15, 20, 
and 20, thus being in agreement six times out of eight. 


Conclusions 

1. The Brown-Spearman formula appears to give results whi(‘h contain both 
constant and chance errors. 

2. These results can be practically eliminated by applying the Wherry-Smith 
correction formula to the results obtained by the Browri-Si)earman formula. 

3. We may find the value of M whi(;h will give the greatest value of R by 
substitution in equation (5c) above, and then by substitution of this value in 
equation (3), find the most probable value of R at its maximum point. 

4. For large values of N wc may secure satisfactory approximations to M by 
means of the simpler formula (5d). 
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THE LIKELfflOOD TEST OF INDEPENDENCE IN CONTINGENCY 

TABLES 


By S. S. Wilks 

J. Neyman and E. S. Pearson^ have applied the principle of the ratio of likeli- 
hoods to the problem of determining criteria for testing various hypotheses about 
the group frequencies in problems dealing with grouped data. In particular, 
they have discussed the fundamental problem, the test of goodness of fit, the 
hypothesis that two samples of grouped data are from the same population, 
and the hypothesis of independence in contingency tables. In their treatment 
of these problems, these authors have started from the limiting form of the 
probability of an observed set of frequencies and have shown that approximately 
each of the appropriate X’s is a function of the minimum value of a corresponding 
X? . The distribution of this minimum value is found, from which the significance 
test is made. 

In certain cases the exact values of the X^s are relatively simple functions of 
the observations which can be as conveniently calculated as the correspond- 
ing x^’s- The purpose of this note is to consider the exact expressions for the X^s 
and find their asymptotic distributions in large samples for the following 
hypotheses: (1) that a sample of grouped data is from a population with 
specified group frequencies (i.e., the fundamental x^ problem) ,(2) that several 
samples of grouped data are from the same population, and (3) that there is 
independence in a contingency table. 


1. The fundamental problem. Let pi, P 2 , • • • Vk be the probabilities of the 
mutually exclusive events A’l. . . . Eh respectively. In a sample of iV events 
the probability that £’i, £' 2 , • • Eh will occur ni, iHj • • * tia: times respectively, 
is given by 


( 1 ) 


C = 


N\ 


Till ml 


ml 






If we let 12 be the class of all sets of values of the p\s such that their sum is 
unity, there is only one set of p\s that maximize C, namely, p, = rij/N (i = 1, 2, 
• • • k). The maximum of C is 


( 2 ) 


C(l2max) = 


N\ 


rii'nS* 




ml ml 


rikl 




^ Biometrika, vol. 20A (1928), pp. 263-294. 
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The likelihood of the hypothesis that the sample is from a population speci- 
fied by p's having the values pi, P 2 , • • • p* is defined as 


(3) 


C(Sl max) \ni/ \rh ) \nk ) 


X, is a quantity which clearly lies between 0 and 1. It will be 1 only when 
p/ = rij/N {j = 1, 2, • • • fc), (that is, when the hypothesis is rigorously sup- 
ported by the sample) and tends to 0 as the sample values Uj/N diverge more 
and more from the hypothetical values p/. The problem of making an exact 
test of significance of an observed value of X, would involve the computation 
of all terms of form (1) the n's of which make X, less than the observed value of 
Xa. This, of course, is impracticable except perhaps for the binomial case with 
small values of N. However, if the n's are large we can find an approximate 

solution. If we let Xj = — — then except for terms of order \/y/N and 

y/N 

higher, the a;'s are distributed according to the law 


(4) 


1 


-Is- 

o " 


V (2ir)‘-‘pip. ’ 

where SyX/ = 0. Neglecting terms of order \/y/N and higher we easily find 
(using natural logarithms) 

6 is approximately distributed according to the function 


-2 log X, = 23 Therefore, if ^ = —2 log X,, 

3 ^V3 


( 6 ) 


(i) 


A-l k-Z 
2 /, 2 


( t ') 


which is the x“ distribution with fc — 1 degrees of freedom. 

Since we have neglected terms of order l/y/N in obtaining (4) there is no 
theoretical reason why should be used in preference to —2 log X, as the cri- 
terion for testing the hypothesis that the sample is from a population specified 
by pi, p 2 , • • • P*. Any practical advantage which —2 log X« may have will 
therefore justify its use. 


2. The hypothesis that several samples of grouped data are from a common 
population. Let p,i, Pt 2 > • * • Pt» be the probabilities with which the mutually 
exclusive events En, jB» 2 , • • occur, where S,pt, = 1 (^ = 1, 2, • • . r). Then 
in a sample of Ni events the chance that L\i, E^ 2 f • • • Eu will occur n,i, rix^, • • • n,* 
times respectively is given by an expression similar to (1). The chance of the 
joint occurrence of the r samples is 


iVi! N2\ 

nii \ ni2l 



V 


ra 


( 6 ) 
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We are interested in testing the hypothesis that the r samples are from the 
same population, that is, that the r sets of p^s Pti, Pt 2 , • • • Pt« (^ = 1, 2, • • • r) 
are the same. The likelihood criterion Xc appropriate to this hypothesis is the 
ratio of the maximum (a)(max)) of (6) subject to the condition that the sets of 
p^s are the same (that is, p*v = p,- say, t = 1, 2, • • • r; j = 1, 2, • • ‘ s) to the max- 
imum (Q (max)) of (6) without this restriction. 

For convenience let the observations be arranged in table form so that is 
the frequency in the t-th row and j-th column. Let n». and n.,- be the totals of 
the i-th row and j-th column respectively, and N the total of all observations. 
Thus rii. is the same as Nx . The expression for Xc will be 


(7) 


Xc = 


n '^n 


n 




n 


niV* * • • 


It can be shown analytically that Xc lies between 0 and 1. It can be 1 only 

Tt \ ' Tt 

when ^ = ... =: j = 1, 2, ... .9, that is, when the hypothesis of a 

iVl iV2 iVr 

common population is perfectly substantiated by the samples. Because of the 
fact that the n*/ arc integers, it is clear that K can be 1 only in exceptional 
cases, but it can take on values arbitrarily near 1 for sufficiently large values of 
the n*/. 

7l> ’ ~~~ N 7) 

If the Ni are large, the quantities x^j = — — 7 =^— arc approximately dis- 

VNx 


tributed according to the function 


( 8 ) 


F = 


( L 




) 


where SyXt/ = 0, ^ = 1, 2, • • • r. By neglecting terms of order 1/\^N and 
higher, we find that 


(9) 


-21ogX. = 2 --- 

» . J 


VNAIl iVNtX ,i)Y 


Denoting the quantity on the right side of (9) by xo it follows by straightforward 
analysis that the characteristic function <p(t) of xo defined by the r{s — l)-tuple 

integral / • • • / dxn • • • dxvB has the value 

00 J—co 

(10) (§) * a -it) ^ . 


But it is well known that (10) is the characteristic function of any quantity dis- 
tributed according to (5) with (k — 1) replaced by (r — !)(« — 1). This, of 
course, is the x" distribution with (r — l){s — 1) degrees of freedom. 

It will be noticed that the exact value of Xc is a function of the observations 
n,y which is independent of the p^s, while the approximate value of —2 log Xo 
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as given by (9) involves the p’s. Before (9) could be used practically, one would 
have to replace the p’s by sample estimates, thus making further approximations 
necessary in order to get the distribution. If the usual estimates P; = n.^/N 
are used for the p’s in xo we find that xo reduces to 

N ) 

.n^ 

N 

which is the familiar x^ function for testing independence in contingency tables. 
However, (11) differs from xo hy terms of the same order (i.e., l/\/iV») as those 
by which xo differs from —2 log X^. Since we have neglected terms of the same 
order in obtaining (8), there is no theoretical reason why (11) should be used 
rather than —2 log K for testing the hypothesis that the m samples arc from a 
common population. 


( 11 ) 








3. The hypothesis of independence in contingency tables. We shall con- 
sider a sample of N observations which can be arranged in a two-way contin- 
gency table having r rows and columns. Let pt,* be the probability that an 
observation will fall in the z-th row and j-th column. The probability that the 
sample of N items will be distributed so that n,, will be the number falling in 
the f-th row’ and j-th column (i = 1, 2, • • • r; j = 1, 2, • • • s) is given by 


( 12 ) 


N\ 


nil! 7212! • • • nrJ 


ViiVii 


V/: 


Here w e arc interested in testing the hypothesis that the classification hy rows 
is independent of the classification by cotumns, that is, that ptj is of the form 
p»r/, w^here 

(13) 2.P. = 1 , - 1 . 

For this hypothesis the ap})ropriatc likelihood criterion, say X^ , is the ratio of 
the maximum (w(max)) of (12) w^hen Pt/ = p, 7 ; restricted by the conditions 
(13) to the maximum (12 (max)) of (12) subject only to the condition that 
p„ = 1. X[. turns out to be identical with Xc in (7). When the hypothesis 

of independence is true, the approximate distribution of the quantity —2 log X^ 
is the same as that of — 2 log Xr when the hypothesis of a common population 
is true. To show’ that the distributions are the same we note that by placing 


(14) 


^ n,, - Np^qJ 


we find from (12) that the Xij are approximately distributed according to the 
function 
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1 V * J 

2 .. p q 


(2t) ^ (piPi ••• Pr)^ (qiq2 q.y 

where ^ Xij == 0. To the same degree of approximation we find 


(16) - 2logx' = ^ 

TT PiQj 

Now the characteristic function of xo^ can be shown without much difficulty to 
be identical with that of Xo as given by (10). The identity of the characteristic 
functions of xo^ and xo implies the identity of the asymptotic distributions of 
— 2 log X c and — 2 log Xc The problem of testing the hypothesis of a common 
population in several samples of grouped data is mathematically equivalent to 
that of testing the hypothesis of independence in contingency tables. 

If the usual estimates p, = qj = ^ arc used in (16) we find that xo 

becomes the expression given by (11). But (11) differs from xo^ by terms of 
order \/y/N and higher. Therefore, — 2 log x' and (11) can differ from each 
other only by terms of order l /\/ N which is the order of approximation involved 
in getting (15) from (12). Thus, — 2 log X' has as much validity as the usual 
criterion (11) for testing for independence in contingency tables. 

The xl method can easily be extended to the case of contingency tables of 
higher order. For example, in a three-way table of r rows, s columns and i 
layers in which n,,* is the number of items observed in the t-th row, j-th column 
and A:-th layer, the X' criterion for testing the hypothesis of independence, that 
is, that the probabilities pxjk are of the form puP 2 jPzk is such that 

-2 log Xl = 2 X log Uxjk) + 4 AT log iVT — 2 23 ' log w,-..) 

(17) 

— 2 (n./. log n./.) - 2 {n.,k log n..k) 

where n,*.. = Y so on. —2 log X' in this case is approximately dis- 

j • k 

tributed like x^ with rs< — r — 5 — ^ + 2 degrees of freedom. 

4. Illustrative examples. To illustrate the use of X, we shall consider the 
following example given by R, A, Fisher^ dealing with de Winton and Bateson’s 
data on results of interbreeding the hybrid (Fi) generation of Primula in which 
two factors are considered. 



Flat Ijcaves 

Crimped I^eaves 



Normal 

Eye 

Primrose 

^een 

Eye 

I^e^s 

Eye 

Primrose 

Queen 

Eye 

Total 

Observed (nt) 

328 

122 

77 

33 

560 

Expected (Npi) 

315 

1 

105 

105 

36 

560 


* Statistical Methods for Research Workers, 4th ed. p. 84. 
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If the two factors are Mendelian, that is, segregate independently, the four 
classes of offspring resulting from interbreeding the Fi generation are expected 
to appear in the ratio 9 :3 :3 : 1 (assuming all classes equally viable). We wish to 
test the hypothesis of a 9 :3 :3 : 1 ratio. It is found that 

^)J = 11.50 , 

Entering Fisher’s table for ?i = 3, we find that the chance of exceeding the 
value 11.50 is less than .01, whicih is significant if we take P = .05 as the critical 
level of significant deviation. Thus, the observed frequencies cannot be n^ason- 
ably explained as chance deviations from the 9 :3 :3 : 1 ratio. 

The usual method gives x“ = 10.87 and n = 3 for the 9:3:3:1 hypothesis. 
The value of P in this case lies between .01 and .02. It follows from the theo- 
retical discussion that 10.87 has no greater validity than 11.50 in testing this 
hypothesis. 

We shall illustrate the use of X^. by using another example given by Fisher 
dealing with Wachtcr’s data for back-crosses in mice. 


— 2 loge X, = 2 loge 10 1^52 logio 2 logio (Np 



Black 

Self 

Black 

Piebald 

Brown 

Self 

Brown 

Piebald 

Total 

Coupling: 

Fi Males 

88 

82 

75 

60 

305 

Fi Females 

38 

34 

30 

21 

123 

Repulsion : 

Fi Males 

115 

93 

80 

130 

418 

Fi Females 

96 

88 

95 

79 

358 

Total 

337 

297 

280 

290 

1204 


The back-crosses were made according as the male or female parents of the 
Fi generation were heterozygous in the two factors Black-Brown, Self-Piebald, 
and according to whether the two dominant genes came both from one parent 
((coupling) or one from each parent (Repulsion). We wish to test the hypoth- 
esis that the proportions are independent of the matings used. We find 

-2 log Xc = 2 loge 10 nt; logic n^ 


+ N logic N - - Hi logic ^ ; J = 21.69 

Entering Fisher’s x“ table for n = 9 we find that the chance of exceeding this 
value is less than .01. The departure from the hypothesis of independence is 
significant on basis of the P = .05 level. The method gives the remarkably 
close result = 21.83, which, with n = 9 gives P < .01. 

5. Summary. We have considered the exact expressions for the Neyman- 
Pearson X criteria appropriate to the following hypotheses: (1) That a sample 
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of grouped data is from a population with specified group proportions (the 
fundamental problem), (2) that several samples of grouped data are from a 
common population, (3) that there is independence in a contingency table. The 
quantity ~ 2 log X for each of these cases is approximately distributed like x^, 
the number of degrees of freedom being given in each case. It is shown that the 
usual x^ method of testing these hypotheses has no greater theoretical validity 
than the X method. On the practical side, it is to be remarked that —2 log X 
can be computed with fewer operations than x^- Two examples are given to 
illustrate the practical application of the X method. 

Princeton University. 



THE PROBABILITY THAT THE MEAN OF A SECOND SAMPLE WILL 
DIFFER FROM THE MEAN OF A FIRST SAMPLE BY LESS THAN 
A CERTAIN MULTIPLE OF THE STANDARD DEVIATION OF 
THE FIRST SAMPLE 

By G. a. Baker, Ph.D. 

The following statement of the significance of a probable error is often made: 
^^The probable error of the mean is a value above and below the mean such that 
if the test were repeated under the same conditions there would be, on the 
average, equal chances that ihe mean would fall within or without this range/’ 
The probable error is attached to the mean of the sample and it is assumed that 
the standard deviation of the sample is that of the sampled normal population. 
This was formerly a very usual explanation of the meaning of probable error by 
research workers, but it is inaccurate and misleading, especially for samples of 
20 or less such as arc dealt with in agricultural experiments. The inaccuracy of 
this explanation of the meaning of probable error has been realized for many 
years by competent statisticians, but no satisfactory treatment has heretofore 
been devised.^ 

The attempted explanation of the probable error in terms of the expected 
frequency of the occurrence of different size deviations of the means of future 
samples from the sample mean does raise a very interesting, important, and 
legitimate question, namely, what is the probability of a second mean lying within 
a certain multiple of the standard deviation of a first sample of the mean of a 
first sample? This question is of fundamental concern to those engaged in 
experimental work. Its answer will indicate to investigators reasonable devia- 
tions from the results of their first experiments, will form a valid basis for the 
rejection of doubtful observations or groups of such observations, and will form 
a basis for a test of the significance of the divergence of results in different 
experiments. It is found that the usual method of treating the probable error 
gives an overly optimistic idea of the smallness of the deviations that may be 
expected in future samples. 

The distribution function of the variable 

V = — 

y 

where x is the mean of the first sample, z is the mean of the second sample, 
and y is the standard deviation of the first sample, is obtained in this paper. 
The sampled population is assumed to be normal. 

1 Camp, Burton H. ^‘Suggested Problems for Mathematical Research," Journal Amer- 
ican Statistical Association, Supplement Vol. 30, No. 189A, Mar. 1935, p. 259, No. 5. 
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Let the sampled population be represented by 

( 1 ) fix) = e"***, — 00 ^ X ^ 00. 

V27r 

If a sample of n is drawn from ( 1 ) the means, as is well known, will be distributed 
as proportional to 

(2) -00 g a; goo, 
and the standard deviation will be distributed as proportional to 

(3) Ogygoo. 

If a second sample of n is drawn from ( 1 ) its mean will be distributed as propor- 
tional to 

(4) , — 00 ^ g 00 . 

Consider the expression 


y 

and call it v. Then v is the difference between the means of the two samples 
measured in terms of the standard deviation of the first sample. The distribu- 
tion function of v is sought. 

The three variables x, y, and z arc independent. Let jy, for the moment, have 
a constant value and write 


( 6 ) 


vy = X ^ z. 


The probability of a given value of vy in divy) for a given value of y is now being 
sought, that is, vy is regarded as constant. This probability is proportional to 


( 7 ) 


l^^n— 2 g— Jnys g— |nv V 



dz 



from the application of the following 

Lemma, If x and y are independent variables, 
and the probability of an a; in dx is fix)dx and the probability of a y in dy is 
fp(y)dy, then the probability of = y — x in dr is proportional to^ 


j fix)<piv + x) dx dr. 


Thus the probability of a value of v in dr for a given y is proportional to 

( 8 ) dv 


* Baker, G. A. ^^Random Sampling from Non-Homogeneous Populations,** Metron^ 
Vol. 8, No. 3, Feb. 1930, p. 68 (slightly modified). 
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since d{vy) = ydv for y constant. Hence the total probability of a particular 
value of V in dv will be given as proportional to 


(9) 

which is proportional to 


( 10 ) 


dv 



n 


If the number in the first sample is ?ii and the number in the second sample is 
n 2 , then (10) becomes 


( 11 ) 


dv 



712 

ni + Th 



This distribution, (11), permits an answer to be given to the question, what is 
the probability that the mean of a sample of a given size 712 will differ from the 
mean of a first sample of size tii by as much as a constant multiple of the standard 
deviation of the first sample? Thus, this distribution gives a clear and compre- 
hensible indication of the expected conformity of future experiments and gives 
a valuable test for the significance of the difference between two means. If it is 
desired to use this distribution as a rejection criterion, ni should be taken so as 
to include as many items as possible and so as to exclude the doubtful ones. 
The doubtful items should be included in the second sami)le. If tlie original 
sample is broken up into two or more samples it must be done in such a way as 
not to destroy the randomness of the resulting parts. 

Example, Suppose for the purpose of illustration that a sample of four is to 
be considered. The proper i^-distribution is 


y /2 dv 



The value of v which is necessary to give a probability of one-half is a root of 


tan”^ 


_P_ 1 a/2 p 

\/2 2 p 2 + 2 


TT 

4 


which is .9. That is, an interval of 1.8 times the standard deviation of the 
sample of four with center at the mean of the sample is necessary for a proba- 
bility of one-half that the mean of the next sample of four will lie in this interval. 
This compares with .75 times the standard deviation of the sample if 


<r 

\/n — 1 
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is used as the probable error of the mean and with .65 times the standard devia- 
tion of the sample if 

a 

\/ n 


is used as the probable error of the mean. The last two methods of calculating a 
probable error with the interpretation indicated at the beginning of this paper 
give the investigator an unwarranted feeling of assurance about the agreement 
of future samples with a first sample. 

If two samples of rii and n 2 are drawn from the normal population, (1), then 
these samples can be combined for the purpose of calculating a standard devia- 
tion and the difference between the means of the samples can be measured in 
terms of the standard deviation of the combined sample. The distribution 
function of the difference of the means divided by the standard deviation of the 
combined sample is 


(110 


dv 

^ j 

. (^1 “h ^ 2 )^ 


2 


This distribution, (110> is the basis for a valid test for the significance of the 
difference between two means. If either this test or the test based on distribu- 
tion (11) shows a significant difference between the means it can not be ignored. 
‘‘Student^s^^ ^-distribution is proportional to 


( 12 ) 


dt 


( 



N 

2' 


The above distributions can be easily transformed into ^distributions so that 
‘‘Student^s^^ tables can be used. For instance, if we put 


V 


V 2 1 

"s/n — i ’ 


AT = n, 


then (10) becomes proportional to (12). Again, put 

D — “S/nx + 712 t 
\/ 712 \/ rii — i 

and (11) becomes proportional to (12). Finally, put 

^ (ni + 71^ t 

\/ 111 712 \/ni + n2 — 1 

and (11') becomes proportional to (12). 


N = Til, 


N = Til + 712, 


Summary. The distributions found for the difference of the means of two 
samples in terms of a standard deviation of one sample or combination of both 
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samples are similar to and easily transformed into ^ ^Student ^-distribution so 
that his tables can be used. However, these distributions answer a practical, 
interesting, and important question that “Student^s'^ ^-distribution does not. 
If in an experimental science a series of observations is made it is desirable to 
know how much a similar series of observations could be expected to differ from 
the set of observations now available. This deviation, if it is to mean anything, 
must be expressed in terms of quantities available' from the observations already 
made. This paper gives the probability function of a deviation in the mean of 
a future sample measured from the mean of a first sample and measured in terms 
of the standard deviation of a first sample, that is, in terms of quantities known 
from the first sample. It is a very definite' advantage and a great gain in assur- 
ance to kne)w the pe)int fre)m whieh measurements are being made aiiel the unit 
in which they are expressed imstead of making vague, ill-defined assiimptie)ns 
about the zero point and unit length of the measuring se*ale. It is true that 
differences that were formerly considered significant may not be so considered 
now. But these differences would appear insignificant if experiments were 
sufficiently repeated, so that the net result is fewer inconsistx'iicies to explain 
away. 



ON SAMPLES FROM A MULTIVARIATE NORMAL POPULATION' 

By Solomon Kullback 

1. Introduction. In this paper we shall discuss the distribution of certain 
functions calculated for samples drawn from a multivariate normal population. 
The method of solution is based on the theory of characteristic functions and 
presents further application of that theory to the distribution problem of 
statistics.^ 

We shall have occasion to refer to the multivariate normal population whose 
distribution law is given by 

(1.1) F(x) - 1 *-m) (p, g = 1, 2, • • • , Tl) 

where B(x — niy x — m) is the real, positive definite quadratic form of the 
x^, — mp with matrix |1 H. Here rrip is the mean in the population of the pth 
variate and Bpq = Apq/2opOqA where Op is the standard deviation in the popu- 
lation of the pth variate; A is the determinant of population correlations Ppg = pqp) 
Apq is the co-factor of ppq in A; and \ Bpq 1 is the determinant of the matrix |1 Bpq H. 

Since the integral of (1.1) over the entire field of variation of the variables is 
unity, we have (using abbreviated notation) 

(1.2) j dx = r”'- 

Equation (1.2) will be true if || Bpq || is complex, provided its real part is vsym- 
mctric and positive definite.^ 

The distribution of sample means of samples from the population (1.1) is 
independent of the distribution of the system of sample variances and covariances 
and is given by^ 

(1.3) Fi{x) = I |l/2 x~m) 

where A {x — m, x — m) is the real, positive definite quadratic form of the .f,, — 

N 

with matrix 1| Here Xp = (1/iV) sample mean of the pth 


^ Presented to the American Mathematical Society, February 23, 1935. 

2 For more complete reference to the theory of characteristic functions as applied to 
statistics see S. Kullback, Annals of Mathematical Statistics, Vol. 5 (1934), pp. 263-307. 

® J. Wishart and M. S. Bartlett, Proc. Cambridge Phil. Soc., Vol. 29 (1933), pp. 260 ff. 

* J. Wishart, Biomeirika, Vol. 20 A (1928), pp. 32-.52. 

J. Wishart and M. S. Bartlett, loc. cit. 
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variate, and Apg = NBpg, where Bpg has been defined for equation (1.1). The 
distribution law of the system of sample variances and covariances is given by* 


( 1 . 4 ) 


F,{a) 


^n(»-i)/4 ^ r(N _ r)/2 


(a) 


J>pq I 


if 

where A(ci) = Apq dpq illld dpq — dqp = (1/iV) (^;>a — — ^q) 

a—I 

with Apq and Xp defined as for (1.3). Since the integral of (1.4) over the entire 
field of variation of the dpq is unity, we have® 

(1.5) / 1 dpq |(^V-n-2)/2 | Apq 

J r-1 

Equation (1.5) will also hold if the matrix H 11 is complex, provided its real 
part is symmetric and positive definite.^ 

2. Variance. Consider a sample of N independent items from the normal 
population (1.1). Ix't 

(2.1) r = a,. 


where dpq is defined as in (1.4). From the theory of characteristic functions 
and (1.5), we have that the characteristic function of the distribution law of v 
is given by® 

(2.2) <p{l) = j FM da = M,., 1''-''^ 1 Apg - it . 

It may be readily shown that 

n 

(2.:{) 1 Apg - it \ = \Apg \ - n Y, 

where A^^ is the co-factor of in 1 Ap<, | . 

We thus have for the distribution law® of v 

(2.4) P{v) = (A/c)<‘'^-’"--i^ f e-’"^{A!c - dt 


^ J. Wish art, loc. cil. 

® Cf. S. S. Wilks, Biomelrikaj Vol. 24 (1932), pp. 471-494. 

^ A. E. Ingham, Proc. Cambridge Phil. Soc.j Vol. 29 (1933), p. 271 ff. The considerations 
in this paper will still hold if the condition above is imposed. 

8 S. Kullback, loc. cil., p. 272. 
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where A = \ Apq |, c = 
By using the fact that® 


E 

I.O"" 


A^^ and A/c > 0 since |1 is positive definite. 


(2.5) 

where /b > 0, a 





> 0, we have 


u-yvQc) , 

k 


JU > 0 

/i g 0 


( 2 . 6 ) 


P^v) 


r (AT- l)/2 


j;(Ar-3)/2 


^—(.Alc)v 


3. Ratio of variances. If vi and V 2 represent the statistic v (defined in 

(2.1)), obtained from independent samples of A^i and Nz items respectively, then 
it may be shown that the distribution law of w = vi/vz is given by^® 


(3.1) P{w) = 


r(iVi + iV2 ~ 2)/2 
r(Ni ~ l )/2 TiNz ~ l )/2 


w 




If we set w = n\/nz, where ni = — 1 and n 2 = A "2 — 1 we obtain for the 

distribution law of 


(3.2) 


P{z) = 2 


I^(yii 7i^l 2i 

Vni/2 rW2 




4. Student’s distribution. Consider a sample of N independent items from 
the normal population (1.1). Let 

n 

(4.1) = S (•*(> — "*/') (^'j - 

where Xp and nip are defined as in (1.3). The characteristic function of the 
simultaneous distribution function of m, defined as in (4.1) and v defined as in 

(2.1) is given by 

<p(.ti,h) = / expirti {xp - mp){xg - m,) + ih apg\ 

(4.2) J \ P.9-1 P.9-1 ) 

F\{.x)Fi,{a) dxda 


* Cf. A. E. Ingham, loc. cit. 

J. Wishart and M. S. Bartlett, Proc. Cambridge Phil. Soc.j Vol. 28 (1932), p. 455 ff. 

S. Kullback, note accepted for publication soon in the Annals of Math. Statistics. 

“ Cf, R. A. Fisher, I. Proc. International Math. Congress^ Toronto (1924), Vol. 2, pp. 805- 
813. 

R. A. Fisher, II. Statistical Methods for ResearchWorkerSj 4th Edition (1932), Edinburgh: 
Oliver and Boyd, pp. 224-227. 
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where F\ and F 2 are defined as in (1.3) and (1.4) respectively. From (1.2) and 
(1.5) we have that 


(4.3) U) = (A/c)W(^/c _ t<i)-‘«(A/c - 

where A and c are defined as in (2.4). The simultaneous distribution of /x and 
V is given by 

(4.4) Pi^, v) = (1/2^)' r r e-*'*'-'* ti) dtidt 2 

J— 00 J—oO 

which evaluated by a procedure similar to that used for (2.4) yields 


(4.5) 


Pin^v) = 


(i4/c)^/2 

T(N- l)/2ri/2 


^“ 1/2 lo 12 q-vA le ^ 


From (4.5) we may readily obtain the distribution of 2 = to be^^ 

TN/2 

(4.6) D{z) = 2 l)/2 ri/2 » (0 ^ 2 ^ 00 ) , 


5. k samples. Suppose we have k independent samples of ATi, ATa, • . . , AT* 
items respectively, drawn fron the normal population defined by (1.1). Let 
/ir, (r = 1, 2, • • • , fc) be the statistic /i, defined by (4.1), for each of the k sam- 
ples respectively; let Vr, (r = 1, 2, • • • , fc) be the statistic 7, defined by (2.1), 
for each of the k samples respectively; let Ato and Fo be the values of these sta- 
tistics for the sample ofiV ^ Ni N 2 + ••• + Arfc items obtained by pooling 
all the samples. 

It may be readily verified that 

(5.1) MO = E l^rNl/m + 2 Z (a /3) 

r=l o,^“l 

(5.2) JVmo + = E (^^Mr + NrVr) 

r-1 

( 5 . 3 ) NV, = E iNrVr + MrIXr) - 2 'Z n\'^N „N f/N (« i 8 ) 

r=l a,^»l 

where Mr = {NNr - N\)/N. 

In view of (2.6) and (4.5), it is evident that the simultaneous distribution 
law of fir, V„ (r = 1, 2, • • • , A:) is given by 

(5.4) P(m) • Q(v) = n Pinr-, Nr) QiVr; Nr) 


Cf. ^^Student,” Biometrika, Vol. 6 (1908-09), pp. 1-25. 

R. A. Fisher, Metroriy Vol. 5 (1925), pp. 90-104. 

P. R. Rider, Annals of Mathemalics^ 2nd S., Vol. 31 (1930), pp. 579-582. 
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where 

(5.5) P(Mr; Nr) ^ ^ (5/Z))i« 


(5.6) Q(7,; Nr) = g-^rKr-ZO 


and fi is the determinant | 1 defined in (1.1) and 0=2 B*^ where B"* is 

p, g“l 

the co-factor of Bp, in 1 Bp, |. 

Using (5.3) and (5.4), we find that the characteristic function of the simul- 
taneous distribution law of <pr = 7, B/D, (r = 0, 1, • • • , it) is given by 

(5.7) v>«o, h, ,h) = j - ./*) p(u) . Q(t))dMdt. 

where 

f/«o) = (BtVZ>) ( E Mril/r/iV - 2 X , (« ?^ /9) 

U-1 J 

and 


F«o, <1, • • • , k) = (B/Z)) 1 Vriitr + ik Nr/N)^ 

Let urB/D = f J and VrBID = r/r, (r = 1, 2, • . • , fc) and rewrite (5.7) as 
the product of + 1 integrals 


(5.8) ^(^ 0 , tiy • • • f tk) = I oh ‘ • Ik 

where 


(5.9) 


, ^ {N,N, 
■*0 — 




v{i/2y 






\vith 

^(f, f) = E f Mr/JV) + 2 ito E UfNaNff/m , 

r *1 

and 

^(Arr-l)/2 ^oo 

(5.10) 7. = exp { - (Nr - itoNr/N - ik)] dvr . 
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By employing (1.2) we find that 


(5.11) h = (ATiJVj . . . 


Ni-iUMi/N iUNiNi/m ... itoNiNk/N^ 
ih NiN,/m Ni - ih Mi/N . . . iU NiNk/N'^ 


ito mNi/m iU NtNi/N- ... AT* - ito Mk/N 

The determinant may be readily evaluated by removing the common factor Nr 
from the rth row (remembering the value of Mr as given in (6.3)) and applying 
the operations*^ (row 1 — row 2), (row 2 — row 3), ... , and then column k + 
column 1 -f column 2 + . . . -j- column fc — 1. We thus obtain 

(5.12) /o = (1 - 

The integral in (5.10) i.s well-known and yields 

(5.13) Ir = {Nr - iUNr/N - 

There thus results 


(5.14) ^(lo, <!,•••, h) = (7(1 - tViV)-<*-*>« II (Ar„ - itoNJN - 


where (7 = H . 

a =* 1 

The simultaneous distribution law of <^r, (r = 0, 1, • • • , k) is given by 

G 

Pin, n, ■■■ , n) = 

(5.15) e~‘'« d/o 

/ (1 - II {Na - ito Na/N - d„)(^“-*>« 

•/-«0 

Integrating successively with re.spcct to <*, tk-\, ,ti and applying (2.5) we have 


Pin, n, • • • ,n) = G exp 


(6.16) 


a= 1 


r(]vr-i)/ 2 ' 2 ,r 


. / "■ Nt \ 

(1 - f<o/iV)'*-»'* 


'* Cf. A. C. Aitken, Quarterly Journal Math., Vol. 2 (1931), pp. 130-13.5. 
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and finally 

PU <Pk) = 

(5.17) (^0 - Nxvi/N NkVk/NY^-*^'^ A 

r(fc-i )/2 11 r(j\r. - 1/2)* 

<* * 1 

If we apply to (5.17) the transformation 

I iPO = <po 

ifir — iX ~ 1 | ‘ * » 1 ^) 

and integrate out «5o, we obtain for the simultaneous distribution law of fr = 

NrVrfNvf> = NrVr/NV^ 


Ditit • • • > r*) 


(5.19) 


r(Ar-i )/2 
r(ft - i)/2 


fi — f* — • 


- }•*)<*-«'* 


* UNa-V)n 

n T(Na - l)/2 


where the limits of variation in (5.19) are*^ 


(5.20) 


jo ^ fi S 1 

|o g fr ^ 1 - fl - fr -1 , 


(r = 2, 3, . . . , fc) 


6. Correlation ratio. Let f = log (1 — fi — f 2 — • • • — ft) where the 
f„ (r = 1, 2, • • • , fc) are defined and distributed as in (5.19). The character- 
istic function of the distribution law of f is given by 


( 6 . 1 ) 


<p(.t) = 


r(iV-i)/2 f(. 
Tik - l)/2 J ^ 


fi- 


f* - • • • - ft) 


(fc+2if-’3)/2 


n 

a"»l 




where the limits of variation are given by (5.20). The integral in (6.1) is readily 
evaluated as a Dirichlet integral,^® and we obtain 


( 6 . 2 ) 


^(t) - r(iV-i)/2 r(fe~i + 220/2 
^ r(A; ~ l )/2 r(iv - 1 + 2 it )/2 * 


Cf. J. Neyman and E. S. Pearson, I. Bulletin de VAcadtmie Polonaise des Sciences et 
des Leitres, Skrie A, Sciences Math ematiqueSy 1931, pp. 460-481. 

“ E. Goursat-E. R. Hedrick, Mathematical Analysis y Vol. I (1904) (Ginn and Co., N. Y.), 
p. 308. 
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The distribution law of f is given by 


(6.3) 


p(r) 


r (iy-i)/2 1 r( fc-l + 2t 0/2 .. 
r(k - l)/2 2r r(iV - 1 + 2/0/2 


Now it may be shown that** 


(6.4) 


1 r I’(* - 1 + 2 i <)/2 ,, _ ef<*-»«(l - ef)<'^-*-»« 

27 r j_« r(iV - 1 + 2t<)/2 r(JV - A :)/2 


so that 


(6.5) 


P(f) 


r(JV-i)/2 

r(jfc - l)/2 r(N - k)/2 


gr(*:-i)/2(l _ giyn-k-aii ^ 


If we set e*’ = ?)*, then we obtain for the distribution*’ of »/* 


(6.6) Dirfi) = - („2)«-3)/2(i _ „t)(K-k-i)n 

^ ^ nk-l)/2l\N ~k)/2^^^ U .j; 

From its definition we have that 


(0.7) r’ = (iVFo - iNTiFi NkVi)/NV, 

which reduces to 


(0.8) ri^ = (NiWi + iVr2TF2 + • • • + NkW,)/NVo 


where Wa = — •^po)(^fla — ^qo) with Xpa the sample mean of the pth 

Pi9“=l 

variate in the ath sample and Xpo the sample mean of the pth variate in the 
sample formed by pooling all the samples.^® 

In a similar manner, we have that the distribution law of 77* 

(a = 1, 2, • • • , fc) is given by 


(0.9) 


D(vl) - _ 


r(N - 1/2) 


(iV„- l)/2 TiN-Nj2) 


o. ivl) 


^V„-3)/2 


(1 


It may be of interest to point out another derivation for the distribution of 
fii = \ - ,2. T^t 

[O = iB/D)(NiVi + NtVi + • • • + NkVk) 

( 6 . 10 ) 

[00 = (B/D)NVo 


“ Whittaker and Watson, Modern Analysis, 2nd Ed., pp. 283, 333. 

” Cf. R. A. Fisher, loc. cil., I. 

H. Hotelling, Proc. National Academy of Sciences, Vol. XI (1925), pp. 657-662. 
*» Cf. S. S. Wilks, loc. cil., p. 482. 
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The characteristic function of the simultaneous distribution law of d and do is 
immediately derivable from (5.14) by replacing to by Nto and U by Nrt 
(r = 1, 2, • • • , k). There results 

(6.11) to) = (1 - f<o)-«->'Ml - ito - 


By a procedure similar to that already luscd wc find that the simultaneous distri- 
bution law of 6 and do is given by 


( 6 . 12 ) 


Pid, do) 


- 0) e-*” 
T(iV - 1 ) 72 ' r{k '- l)/2 • 


By applying to (6.12) the transformation d = dji\ do = do and integrating out 
the value of do, we find for the distribution law of K- 


(6.13) Dim = 

From (6.12) and (6.10) it may be shown that the following estimates of variance 
all have the same expected value*® 


(6.14) 


iViF, d-iV^F* + ... -f 
N ~k 

NVo 
N - 1 

k -~i 


7. Distribution of variances. J^et 


(dr = NrVrB/D 


(r = 1, 2, . . . , &) 


(7.1) 


= NVoB/D 

d = iB/D) iN,\\ + NoVo -!-•••+ N,Vk) 


where the right members of (7.1) are defined as in section 5. It is evident that 
the characteristic function of the simultaneous distribution law of Or, 

(r = 1, 2, • • • , A; — 1) is derivable from (5.14) by replacing l)y Nh, U by 
Nritr + 0> (^ = 1, 2, • • • , fc — 1) and h by Nkt Thus 


(7.2) 

(1 - ito - n (1 - . 


Cf. J. Neyman and E. S. Pearson, II. Biomelrika^ Vol. 20A (1928), pp. 273-274. 
S. Kullback, Annals of Mathematical Stat'islicSj Vol. 6 (1935), pp. 76-77. 
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By proceeding as in section 5 vve arrive at tiic result that the simultaneous 
distribution law of 0, do, dr, (r = 1, 2, • • • , fc - 1) is given by 


(7.3) 


p{e, 00, 0r) = 

r(* - i)/2 r(iv, - i)/2 


(^Vjfc-3)/2 


^(Ar„-3)/2 


n 

u r(Ar.-i)/2 

where ^ g 0i + 02 + • • • + 0a i- 

By integrating out the variable 0,, from (7.3) wc have for the simultaneous 
distribution law of 0, 0,, (r = 1,2, .. ,k — 1) 

(7.4) D{0, 0,) = __ 11 vlN. - l)/2 

« — 1 

A procedure Asiiuilar to that iLsod to derive (5.19) yields for tlu' simultaneous 
distribution law of 

(7.5) = Br/e (r = 1, 2, . . . , fc -- 1) 


n 


r(F„'^ I: 


(7.7) 


where the limits of variation in (7.0) are*-'’ 

[o g ^ 1 

[0 ^ ^ l - il - - • • • - , (r = 2, ■ . . , fc - 1) . 

In a maniK'r similar to the derivation of (6.(i) wo find the di.stribution law of 
hi (a = 1, 2, ... , k - l),kl = 1 - h - ^ 2 to be 


(7.8) 


D{hl) 


riN - k)/2 


V(NI - l)/2 ViN-k - Na + l)/2 

(^2)(A'„-8)/2 (1 _ /,2)(N- A - Ar „. n /2 


(« = 1, 2, • . . , fc) . 


From the distribution law in (7.3) Me readily obtain that the characteristic 
function of the distribution law of yl = log (0„/(0o - 0) is given by 

r- (A r(JV„ - 1 + 2it)/2 r(fc - 1 - 2it)/2 . _ j ^ 

(7.9) .pit) = f(iv;^:ri)72 T(F3 -i)72— ■ ' 


20 Cf. J. Neyman and K. S. Pearson, loc. cit.y 1. 
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We thus have that the distribution law of is given by 


P(yl) = 


(7.10) 


r{Na - i)/2 r(fc - i)/2 2 t 

j" e~’’ r(W„ - 1 + 2it)/2 r{k~l- 2it)/2 dt . 

The integral in (7.10) is known, and there results 
(7 11) P(y^) nN a + k- 2) /2 _ / v*y(Ar„+t-2)/2 

U.ii; t'K.ya) - r(A; - l)/2 \ ^ / 

y 2 

If we set e “ = Oa/iOo — d) = \l we have for the distribution of 

(7.12) D(X®) = (1 . X®)-(Va+t-2)/2 

^ ^ ^ r(iV„ - i)/2 r(fc - i )/2 ^ u + A„; 

An extension of the procedure used to obtain (7.9) yields as the characteristic 
function of the simultaneous distribution 017^,72, ••• .7* 

k, ••• ,tk) = ~ A ~ 

’ V(k - l)/2 

(7.13) 

n l'iNa - 1 + 2ita)/2 
_ " r(iV„ - l)/2' 


0-1 


Successive application of the method used to evaluate (7.10) yields as the simul- 
taneous distribution law of the 7« 


P(7^ 7^, • • • , yl) = f (1 + c>>.' + . . . + 


(7.14) 


n 


gTi(Afa-l)/2 


L r(iV„-i)/ 2 ' 


The simultaneous distribution of the X* defined as in (7.12) is given by 


(7.15) 


D(\l, X2, • • • , \l) — a + + ^2 + • • • + ' 


0/2 


1^1 r(A„-i)/ 2 ' 


Whittaker and Watson, loc. cit., pp. 283, 383. 
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8. Conclusion. In this paper we have presented further instances of the 
applicability of the theory of characteristic functions to the distribution problem 
of statistics. In a subsequent paper the author hopes to illustrate the applica- 
tion of the results here developed to specific numerical problems. 

Washington. D. C. 



ON A CRITERION FOR THE REJECTION OF OBSERVATIONS AND 
THE DISTRIBUTION OF THE RATIO OF DEVIATION TO 
SAMPLE STANDARD DEVIATION 

By William R. Thompson 

Criteria for the rejection of outlying observations may be designed to reject a 
given fraction of all observations, or a proportion varying with the size of the 
sample. Irwin^ has discussed several criteria based on sampling from a normal 
population which had been used previously, as well as one which he proposed. 
This is based on the principal of fixing the expectation of rejecting an observation 
from a sample independently of the aggregate number, iV, of the sample. The 
criterion, X, is l/a times the interval between successive observations in ascending 
order of magnitude, where a is the standard deviation of the sampled population. 
In the same paper he gave, for different values of N, a table of Pi(X) and P 2 (X), 
respectively probabilities of exceeding given values of X for the first or second 
such interval from either end. In actual use, however, a is estimated from the 
sample standard deviation, and we are left to decide whether observations in 
question are to be included or not in estimating the standard deviation as also 
whether or not to modify this by addition or subtraction of an estimate of its 
probable error. The object of the present communication is to develop a 
criterion free from defects of this nature, depending only on the assumption of 
random sampling from a normal universe. For this purpose w'e develop the 
distribution of r defined by 

( 1 ) 


where s is the sample standard deviation and 8 is the deviation of an arbitrary 
observation of the sample from the sample mean. This leads to definite criteria, 
which are simple in application. 

Accordingly, consider a sample {ar*}, f = 1, • • • , W, to be drawn at random 
from a normal population of unknown mean and standard deviation, and that 
the order of enumeration is arbitrary. Then xn is an arbitrary one of the ele- 
ments or observations. Now, let 


( 2 ) 


1 ^ 

= -•12 
N « 


Xii 



{xi - xy 


( 3 ) 


5 ^ Xn — X , 

214 


N 


and 
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Then we will prove that the distribution of r s 5/s in repeated sampling with a 
fixed aggregate number, AT, is given by substitution of 

y/n>z = t = y/n^T/y/ n + 1 — 


in the z or i distribution of ^^Student*' and R. A. Fisher, ^ where n = iV” — 2. 
To this end let N > 2^ and let n = AT — 2 , and 

(4) (n -f l)xi = ““ ~ • 


Obviously, the (n + l)xi + xn = N-x, whence 
xn — X 5 


(5) 


X — a:i = 


whence 


n 2 

Xat - Xi = . 


— - — - - m »» tn/n — , ■« 

w + ln + l’ n + 1 

Furthermore, N — SiCa" — xi)* 4 - (n + 1) (xi — ly + {xn — xy, whence 


( 6 ) 


iV-s^ = S,(x - XiY 

n + 1 


Now, considering the separate samples, {x<}, i = 1 , • • * , AT — 1, and 
of aggregate number, A^ — 1 and 1, respectively; Fisher has shown^ that if we 
set 


(7) 


{ xn - ^i) • Vn . /n -M 
y/ Si{x — x^y r n + 2 ’ 


then, for <o > 0 > the probability, p, that < < <o is 



and P = 2(1 — p) is the probability that 1 < | > < 0 - 
Now, (5) and ( 6 ) in (7) give 


(9) 


i = 


n + 2 

n +1 


i/(" + (*‘ ” n + l) 


V r n 


(n + 1 ) 


\/ n 


+ 2 + 1 


whence 


tan 6 = 


•\/ n 




Accordingly, P is the probability that \t \ > tq — to /y ^ ^2 * 

Thus, if we want to determine tq so that by rejecting all observations deviat- 
ing from the sample mean by more than s-ro wc shall have an average relative 
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frequency of rejections per sample which is fixed, say 0; then we need only 
to set P = 0/iV. This follows at once from the hypothesis as a; is a random 
element of the random sample of N elements drawn from the same normal 
universe (of unknown mean and standard deviation). The criterion of re- 
jection, S To, is uniquely determined from the sample standard deviation and 


TABLE I 


N 

T for given ^ 

t for given ^ 

n 


0 - 0.2 

0.1 

0.05 

02 

01 

0.05 


3 

1.40646 

1.41228 

1.41373 

9.51 

19.08 

38.19 

1 

4 

1.6454 

1.6887 

1.7103 

4.30 

6.20 

8.84 

2 

.5 

1.791 

1.869 

1.917 

3.48 

4.54 

5 84 

3 

6 

1.895 

1.997 

2.067 

3.19 

3.97 

4 84 

4 

7 

1.973 

2.093 

2.182 

3.04 

3.68 

4.38 

5 

8 

2.041 

2.170 

2.274 

2.97 

3.51 

4.12 

6 

9 

2.099 

2.^7 

2.348 

2 93 

3.42 

3 94 

7 

10 

2.144 

2.295 

2.413 

2.89 

3.36 

3 83 

8 

11 

2.190 

2.343 

2.472 

2.88 

3 31 

3 76 

9 

12 

2.229 

2.388 

2.521 

2.87 

3 28 

3 70 

10 

13 

2.262 

2.425 

2.567 

2 86 

3 25 

3 66 

11 

14 

2.296 

2.463 

2 598 

2 86 

3 24 

3 60 

12 

15 

2.325 

2.497 

2 636 

2 86 

3 23 

3 58 

13 

16 

2.357 

2.522 

2 670 

2 87 

3 21 

3.56 

14 

17 

2.382 

2 553 

2.699 

2 87 

3 21 

3 54 

15 

18 

2.404 

2 576 

2 733 

2 87 

3 20 

3 54 

16 

19 

2.429 1 

2 601 

2 759 

2 88 

3 20 

3 53 

17 

20 

2.448 

2.625 

2 783 

2 88 

3 20 

3 52 

18 

21 

2.471 

2 647 

2 800 

2 89 

3 20 

3 50 

19 

22 

2 487 

2.661 

2 819 

2 89 

3 19 

3 49 

20 

32 

2 636 

2 819 

2 985 

2 944 

3 216 

3.479 i 

30 

42 

2.737 

2 925 

3.093 

2 991 

3 248 

3.489 

40 

102 

3.047 

3 233 

3 407 

3 182 

3 397 

3 603 

100 

202 

3.266 

3.448 

3.621 

3 347 

3 546 

3.736 

200 

502 

3.528 

3 704 

3.872 

3 569 

3.752 

3.927 

500 

1002 

3 714 

3 881 

4 047 

3 737 

3 908 

4 078 

1000 


P * 0/iNr. 

Note: T is computed to 0.5 unit in the last place given from the given i which is believed 
correct to 1 unit in the last place. 


number of elements, N, for any prescribed 0. Dropping the subscript, criti- 
cal values of r are given in Table I (together with corresponding values of t) 
for = 0.2, 0.1, and 0.05 and values of n s iV — 2 which should be sufficient 
for most practical purposes. The normal deviate (for unit standard deviation 
and the same P) lies between these values and is approached by r and t (in the 
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tabulated range of 0) from opposite sides as n increases, the approximation to t 
being the closer of the two. Accordingly Sheppard\s tables may be used with 
good approximation for n > 1000, with <t>/N = P, the probability of exceeding 
numerically the given deviate. They may be used to advantage also in inter- 
polation between n = 100, 1000 by means of differences at the tabulated 
points. 

A crude rejection system where we reject an observation if it deviate from the 
mean of all others by more than fixed constant times the standard deviation of 
such a difference in terms of a as estimated from the variance of these others by 


O' 




amounts to taking a fixed value of t as criterion. 


The 


intention is usually to fix the probability (P) of rejection of observations rather 
than the expectation of rejections per sample (0); and this, of course, is the 
expected approximate result for large samples. For small samples, however, say 
4 < A' < 32, by rejection of observations deviating thus by more than 


3 




appears from (7) and Table I that approximately <t> would 


be fixed rather than P. 

The T-criteriou not only affords a precise extension of such a rejection system, 
but also a reduction of the actual process of application to a minimum, with one 
noteworthy exception for the case, N = 3. Here we may use as criterion with 


d 

identical effect the ratio, where Xi ^ X 2 S ^ 3 , d 2 = x^ — Xzy d\ = x% — Xi^ and 

di 

d 2 ^ di. This order can always be adopted for the test, and it is readily verified 
that 


( 11 ) 


^2 ^ V3 • / - 1 
f/; " 2 


whence for 0 = 0.2, 0.1, and 0.05, respectively we have ^ ^ 7.74, 16.0, and 32.6. 

di 

Thus, for A = 3, we may take merely the ratio of the greater to the other 
numerical deviation from the median observation as criterion. 


Section 2 

Although not required in connection with the rejection criterion developed 
above, there is a simple generalization of r with a closely related distribution 
which may be valuable in somewhat different circumstances. Consider the same 
situation as given above, except that {o^tl is divided into two subsets, where 
i = 1, • • • , A — fc, and t = A — fc -f- 1, • • • , A, respectively; giving two 
random samples of aggregate number, N — k and k. Let the means of these be 
xi and X 2 j respectively; and s and x be as before. Then in general let 

8 = .fg — x and r = - . 


( 12 ) 
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TABLE II 


N 

P - 0.9 

08 

07 

06 

05 

04 

P « 03 

02 

0.1 

0 05 

0 02 

0.01 

N 

3 

221 

437 

643 

832 

1 000 

1 144 

1 260 

1 3450 

1 3968 

1 4099 

1 41352 

1 414039 

3 

4 

173 

347 

520 

693 

866 

1 039 

1 212 

1 386 

1 559 

1 0080 

1 6974 

1 7147 

4 

5 

158 

316 

476 

639 

808 

983 

1 170 

1 374 

1 611 

1.757 

1.869 

1 9175 

5 

6 

.149 

300 

453 

612 

.777 

.952 

1 143 

1 360 

1 631 

1.814 

1 973 

2.0509 

6 

7 

144 

290 

440 

.594 

.757 

932 

1 125 

1 349 

1 640 

1 848 

2 040 

2.142 , 

7 

8 

.141 

284 

431 

583 

744 

918 

1 111 

1 340 

1 644 

1 870 

2 087 

2.207 

8 

9 

139 

280 

425 

.575 

734 

907 

1 102 

1 334 

1 647 

1 885 

2 121 

2 256 

9 

10 

.137 

.276 

420 

569 

727 

899 

1 094 

1 328 

1 648 

1 895 

2 146 

2 294 

10 

11 

136 

.274 

416 

.564 

.721 

893 

1 088 

1 324 

1 648 

1 904 

2 166 

2 324 

11 

12 

.135 

272 

413 

.560 

.717 

.888 

1 083 

1 320 

1.649 

1 910 

2 183 

2 348 

12 

13 

.134 

270 

411 

557 

713 

884 

1 080 

1 317 

1 649 

1 915 

2 196 

2 368 

13 

14 

134 

269 

408 

554 

710 

881 

1 076 

1 314 

1 649 

1 919 

2 207 

2 385 

14 

15 

133 

268 

407 

552 

.707 

878 

1 073 

1 312 

1 649 

1 923 

2 210 

2 399 

15 

16 

.133 

267 

405 

550 

705 

875 

1 071 

1 310 

1 649 

1 926 

2 224 

2 411 

16 

17 

.132 

.266 

404 

548 

703 

.873 

1 069 

1 309 

1 649 

1 928 

2 231 

2 422 

17 

18 

.132 

.265 

403 

547 

701 

871 

1 067 

1 307 

1.649 

1 931 

2 237 

2 432 

18 

19 

131 

264 

402 

546 

699 

869 

1 065 

1 305 

1 649 

1.932 

2 242 

2 440 

19 

20 

131 

264 

401 

544 

698 

.868 

1 063 

1 304 

1 649 

1 934 

2 247 

2 447 

20 

21 

130 

263 

400 

543 

697 

.867 

1 062 

1 303 

1 649 

1 936 

2 251 

2 454 

21 

22 

.130 

.263 

399 

542 

690 

805 

1 061 

1 302 

1 649 

1 937 

2 255 

2 460 

22 

23 

.130 

262 

398 i 

.541 

695 

864 

1 059 

1 301 

1 649 

1 938 

2 259 

2 465 

23 

24 

.130 

.202 

.398 

541 

694 

863 

1 058 

1 300 

1 649 

1 940 

2 262 

2 470 

24 

25 

.130 

.261 

397 

540 

093 

862 

1 057 

1 299 

1 649 

1 941 

2 264 

2 475 

25 

20 

.130 

.261 

397 

539 

692 

861 

1 056 

1 299 

1 648 

1.942 

2 267 

2 479 

26 

27 

129 

.261 

.397 

538 

691 

860 

1 056 

1 298 

1 648 

1 942 

2 269 

2 483 

27 

28 

129 

.261 

396 

.538 

691 

800 

1 055 

1 297 

1 648 

1 943 

2 272 

2 487 

28 

29 

.129 

260 

.396 

537 

690 

859 

1 054 

1 297 

1 648 

1 944 

2 274 

2 490 

29 

30 

.129 

260 1 

395 

537 

690 

859 

1 054 

1 296 

1 648 

1 944 

2 275 

2 493 

30 

31 

.129 

.260 

.395 

536 

689 

858 

1 054 

1 296 

1 048 

1 945 

2 277 

2 495 

31 

32 

129 

.260 

394 

530 

.689 

858 

1 053 

1 295 

1 648 

1 945 

2 279 

2 498 

32 

00 

.12566 

.25335 

38532 

.52440 

.67449 

84162 

1 03643 

1 28155 

1 64485 

1 95990 

2 32634 

2 57582 

00 


Note: \p,N,k) - y ‘ 


Further, let ni + 1 = iV — A:, ??2 + 1 = fc, Si{x — .fi)- be the sum of squared 
deviations in the first sub-sample and similarly — ^ 2 )“ be that for the 

second. Then Fisher has shown- that the generalized 

^ ^ „ a/ 

V /S,(a- - Xi)- + Sz{x - .rs)'-* Wi + /«a + 2 

is distributed as before for n = ?/i + ^ 2 . Obviously, 

Ar..c = (rii + l).ri + (ai 2 + l)i‘*2 , 


whence 
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(14) 

and 


t _ (^1 “t" 1) (■•'! — J'l) _ (Wl 1) (•? — .?l) 

N ^ 


(16) Si(x - JciY- + &(x - Xi)- = N . «■•= - (wi + 1) ( ?i _ xY 


whence 


= N - 


!??_+__ 

+ 1 



bh + 1 ) ixi - xY 


(16) t = , where n = N - 2 ; 

i.e., t = \^n • tan \/n + 2 — /b • sin 0 = \^k • r. 


In connection with analysis of variance where the total sample may be divided 
into several subsets of observations, the generalized r may be used, accordingly, 
to indicate in a simple manner which (if any) of the means of subsets differ 
significantly from the general mean where the equivalent <-test is applicable. 

In general let t^p.jv.a) ^ 0 be a number such that P is the probability that 
/t/ > r(p.jv.A),* where, as above, N is the total number of observations in the 
whole sample, k is the number of these in the subsample and r is defined by (12). 
Then by (16), obviously. 


(17) r(P,jv,A-) = y — •'r(P.iv.i) . 


In Table II are given values of r(P. n, d for a range of values of the arguments, N 
and P. The critical values of r in Table I are simply values of this function for 
P = €l>/N where is taken as parameter, i.e., r(^/y. y.i). 

Rider* has given an interesting review of rejection criteria previously proposed. 
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ON CERTAIN COEFFICIENTS USED IN MATHEMATICAL STATISTICS 
By Everett H. Larquier, S.J. 

I. Introduction 

(1.1) We have studied here certain coefficients arising in interpolation, numeri- 
cal differentiation and integration formulas in order to establish explicit expan- 
sions for these coefficients in the form of a finite summation. Ordinarily they 
are obtained by means of recursion relations, which necessarily demand the 
building up of a complete table in order to find the desired set of coefficients. By 
using the methods described in this paper, we are able to calculate any desired 
set independent of the ones which precede it in the table. In the literature we 
find two other expansions of the difference quotients of zero, one by Jeffery^ and 
one by Boole.^ Our expansion for the differential quotients of zero is the same as 
one obtained by Jeffery,^ however the proof is more elementary and simple. 

The Bernoulli numbers also find a wide range of application in many finite 
integration formulas, and hence our attention was drawn to the discussion of 
certain coefficients which occur in the study of these functions.^ As in the cases 
mentioned above these coefficients are likewise ordinarily obtained by recursion 
formulas, but by our expansions they may be obtained directly. 

II. Difference Quotients of Zero 

(2.1) It is our purpose here to show that this difference quotient of zero, 
may be expressed by the following summation: 



where Ui, 02 , • • • , Um-i = 0 , 1, 2, • • • , n — m and Oi ^ 02 ^ ^ ctm-i ^ 0 . 

Obviously the number of terms in the summation is the number of combina- 
tions of n — m + 1 things taken m — 1 together where repetitions are allowed. 

(2.2) By means of the recursion relation® 

== m A”*0”“^ + m (2) 

* Henry M. Jeffery, “On a method of expressing the combinations and homogeneous 
products of numbers and their powers by means of differences of nothing.” Quarterly 
Journal of Pure and Applied Mathematics j vol. 4 (1861), pp. 364 ff. 

* George Boole, A Treatise on the Calculus of Finite Differences^ (Stechert, N. Y.), p. 20. 

* Loc. cit. 

^ Steffensen, Interpolation (Williams & Wilkins, Baltimore), p. 125. 

* L. M. Milne-Thompson, Calculus of Finite Differences, (Macmillan), p, 36, sec. 2.53, (2). 
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we are able to build up a table of values. By substitution it can be shown that 
(1) satisfies the values of this table except when m = 0, 1 and for m > n, for 
then the summation becomes meaningless. We therefore define the summation 
to have the value 0 for m = 0, w > 0 and for m > n, and the value 1 for m = 1. 
We exhibit one substitution below. When m = 3 and n = 4, 



(2.3) Taking (2), we proceed by repeated application of the recursion formula 
and finally we have 

n — 1 

A”*0" = ^ A”* , 

d “m 

which since A’"0"' = m!,® becomes 

n — 1 

A”^”* = -f 2 . (3) 

d—m 

We will now prove (1). Proceeding by induction we assume (1) true for 
m — 1. Hence from (3) we have 



where ai, 02 , • • • , am -2 = 0, 1, 2, • • • , d -- m + 2 and ai ^ 02 ^ ^ 

am -2 ^ 0. This becomes 


n — 1 

A*^” = + m! 

d^m 


SP' (m — iVni-i 



(4) 


Using the symbol 2)2 for the double summation of (4), we may write 




1 


* Milne-Thompson, loc. cit. 
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+ 


jn — 

V 

m 

\ 

jn — 

1 / 

m 


,m — 

i) 

f m 



varies 


■ ■ ■ ©■■■ ^ 
from w to n — 1. Hence by including m”"’" under the summation we are 
able to replace the double summation by a single one and have 




where ai, aa, • • • , Um-i =?= 0, 1, 2, • • • , n — m and ai ^ aa ^ ^ ^ 0. 

Hence (1) is proved.^ 


III. Differential Quotients of Zero 

(3.1) In Markoff ^s formula for numerical differentiation we meet coefficients 

of the type We will show here that this differential quotient of zero 

may be expressed by the following finite sum: 

jr)m0(«) ^ (_ . . . p^ J (5) 

where pi > pa > • • > Pn-m > 0 take on values from 1, 2, • • • , n — 1. Obvi- 
ously the number of terms in the expansion will be the same as the number of 
combinations of n ~ 1 things taken n — m together without repetitions. 

(3.2) By means of the recursion formula* 

jr)mO(n) (1 _ £)m0(n-l) ^ ^ 

we are able to build up a table of values. By substitution it can easily be shown 
that (5) satisfies the values of the table when n > m > 0. For the other values 
the summation is meaningless, hence we define it to have the value 1 for 
m = n > 0; and the value 0 for m > n and m = 0. When m = 2 and n = 4, 
we have 

= (~ 1)^-2 2! {(3.2) -f- (3-1) + (2.1)1 = 22, 
which is the same value as found by (6). 

’ Our expansion may be shown to be equal to that of Jeffery’s cited in the introduction, 
which is where ^0*"^ ” expresses the sum of all the homogeneous products 

of n dimensions which can be formed by the first m natural numbers and their powers. The 
proof of Jeffery’s expansion involves the use of complicated symbolic operators, while our 
proof uses elementary notions only. 

® Steffensen, op. cit., p. 57, 58, (12) and (14). 
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(3.3) Returning to (6), we obtain by its repeated application: 

n ~m— 1 

JDmQln) _ (_ DmQim) + m ^ 1)“ (« — 1)'“’ 

a“*0 

or, since = ml y 

n — m— 1 

Dmo(») =, (_ i)n-m ml + tti (- 1)“ (n - !)<“> (7) 

a = 0 

In proving (5), we proceed by induction, assuming (5) true for m — 1; hence 
by (7) we have 

£)mQ(n) _ __ l)(n-m) 

n-m-1 (8) 

+ m! X) (- — 1)^“’ Jl(VlP2 • • • Pn~m-a) 

a“0 

where pi > p 2 > • • • > Pn-m-a > 0 take the values 1, 2, • * • , n — a — 2, 
Expanding the double sum of (8) we have 

n — 2 n — 3 

= 13 (Pl • • • P»->») + 13 - 1) (Pl • • • Pn-m-l) 

/)j-X 

n — 4 

+ £ (n - 1) (n - 2) (pi . . . p„_„_ 2 ) (9) 

nt -- 1 

+ • • • + 23 (n - 1) (n - 2) • • • (m + 1) (pO 

in which pi > p 2 > • * • > > 0 always holds, where 

5 = n — m, n — m — 1, • • • , 2, 1 

in turn. 

Upon inspection, it is evident that (9) contains all the terms of (5) with the 
exception of (n — 1) (n — 2) • • (m + l)m. Hence, since by definition 
(n — !)(»»“»"> = (n — 1) • • • (m + l)m, we may include the first term on the 
right-hand side of (8) under the summation and then we have proved (5).® 

IV. The Coefficient 

(4.1) In discussing the Bernoulli numbers and the Bernoulli polynomials, 
Steifensen^® makes use of the relation : 


BUx) = (- 1)-- E GV’z'-" (10) 


• Jeffery expansion referred to in the introduction is where ^ ^ 

expresses the sum of the combinations of the first n — 1 natural numbers taken n-- m 
together. The remarks made above under article 2.3 concerning symbolic operators also 
apply here mutaiis mutandis. 

Op. cit., p. 125, (24); cf. also Jacobi’s theorem. Journal fur reine und angewandte 
Mathematik (Crelle’s Journal), vol. 12, pp. 268-269. 
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where z == x — We wish here to show that the coefficient ordinarily 
found by means of recursion formulas, may be obtained from the following 
summation: 

r-n + l ATn + l iVi+l 

= (2r)»"> E [iV„] E [Ar„-J ... Z [^il (11) 

JV„-3 Ar„.i-3 Ni-3 

where [N] == (Ny^^/(2Ny^\ Obviously the summation has no meaning for 
n = 0, nor for r < n + 2. Therefore it will be necessary to make definitions 
or devise other schemes for meeting this difficulty. 

Steffensen^ shows that 

=1 for r ^ 0; G^l^i = 0 for r > 1 ; (12) 

and likewise he gives the following recursion relation: 

(2r - 2ny^^ = (2r)^"^ G\r^^ + (r - n + 1)^'^ Gi^l^ . (13) 

In accordance with (12), we define the sum of (11) to be equal to 1 for n = 0, 
and to be equal to 0 for n = r ~ 1, when r > 1. By means of the recursion 
formula (13), Steffensen^^ gives a table of values of G^/\ which (11) may be 
easily shown to satisfy. From this table we have the value Gs* ^ = 10. Using 
this as an example of the expansion, we have by (11) : 

ATj-Hl ATj-hl 

= (12)«« Z [^ 3 ] Z [A^2] Z l^il 

iVl-3 iVa«3 iVi-3 

= (12)<«<[3]{(4]([5] + [4] + [3]) + [3]([4] + [3])} 

+ 141{[5]([6] + [5] + [4] + [3]) + [4]([5] + [4] + [3]) + [3](l4] + [3])}> 
= 10 . 

(4.2) Before proving the general case, we will prove by induction that 

GV’ = (2r)® Z (14) 

JVi — 3 

Assuming (14) true for r — 1, we have by (12) and (13) 

= (2r)® i: [iV,] + (2r)«> [r] = (2r)® Z • 

JVi“3 Ari-3 

Hence (14) is valid. 

(4.3) We shall prove (11) with respect to r. By repeated application of (13), 
we have 

“ Op. cit., p. 125. 

» Op. cit., p. 126. 
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= {(2r)«)/(2r - 2n)®)GL'“‘’+ {(2r)<»(r -n + l)»V(2r - 2n + 2)«Mfc‘ > 
+ {(2r)®(r — n + l)®(r - n + 2)®/(2r — 2n + 4)“>1 + • • • 

+ {(2r)®(r - n + 1)<« • • • (r - l)'«/(2r - 2)«’‘M (?</-» 

+ {(r - n + D® . . . (r)®/(2r - 2)<'">1 GV ' 

r-n ;Vj4-1 

= (2r)'*"> £ (iV„l ... 2 [^il 

= 3 JVl-3 

r— nH~l J\ r 2 4- 1 

+ (2r)«’‘> [r - n + 1] £ [ATn-,] . • . E [AT,] 

An-l-3 JVj-3 

r — n-f2 ATj + l 

+ (2r)“"> [r _ n + 1] [r - n + 2] E S [A/’i] + • • . 

iVn * *“ 3 A^l =» 3 

+ (2r)<*"' [r - n + 1] [r - n + 2) . . . [r - 1] E 

-3 

+ (2r)'“'*> [r - n + 1] . . . [r] . 

It is evident from inspection that this is nothing but an expanded form of (11), 
hence (11) is proved with respect to r. 

(4.4) Proceeding in the same way as above to prove induction with respect to 
n, we have again by repeated application of (13) 

G‘n'’ ={{r-n + l)®/(2r - 2n)'«}GiL\ + {(2r)®(r - n)®/(2r - 2n)'«}G‘/.r,‘' 
+ {(2r)<'nr -n- l)®/(2r - 2n)“') Gil?’ 

H + {(2r)'^'-*"“*’ (3)®/(2r - 2n)‘’‘’-*”"” 1 G'.l?’ 

r— n-f-2 A r2 4~ 1 

= (2r)«”’lr-n+l] £ [A^n-J • • • E [A^il 

fVn-i“3 Ari«3 

r — rt -f - 1 A f 2 4- 1 

+ (2r)'^“>[r - nl E tA^-il ’ ’ • £ 

A^«-l=3 Ari = 3 

6 Ar2-f 1 

+ ... + (2r)«">[4] E [A^n-i] ••• E lA^i] 

Ar„_i“3 iVi“3 

4 Ar2-f 1 

+ (2r)<^'‘’[3] E [Ar„-i] ... E 

fV„-i=-3 ATI -3 

From this latter equation, (11) follows immediately and therefore the proof is 
complete. 

(4.5) Bernoulli numbers may be expressed in terms of this coefficient (7^/ \ as is 
shown by Steffensen,^® in the following way 

B,, = (-l)'G'/’ (15) 


Op. cit., p. 125, (27). 
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which wc shall express in terms of (11). However as (11) is meaningless for 
n = r, we obtain the relation 

(2r + for r > 0, (16) 

which follows immediately from (12) and (13), and thereby obviate this difficulty. 
Hence, by (11), (15) and (16), we can write 

3 JVri+l 

{(-l)<'+»(2r)!/(4)®j i: [Ar,_J ^ [iV, J • • • E [N,] (17) 

JVr l“-3 Xr 

We note here that the definitions of the summation, given in 4.1, likewise hold. 

Saint Louis University 
Saint Louis, Missouri 



NOTICE OF THE ORGANIZATION OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 

For sometime there has been a feeling that the theory of statistics would be 
advanced in the United States by the formation of an organization of those per- 
sons especially interested in the mathematical aspects of the subject. As a con- 
sequence, a meeting of interested persons was arranged for September 12, 1935, at 
Ann Arbor, Michigan. At the meeting, it was decided to form an organization 
to be known as the Institute of Mathematical Statistics. A constitution and 
by-laws were adopted and the following officers elected to serve until December 
31st, 1936: President, H. L. Rietz; Vice-president, W. A. Shewhart; Secretary- 
Treasurer, A. T. Craig. A resolution, instructing the officers to investigates the 
feasibility of the affiliation of the Institute with the American Mathematical 
Society or with the American Statistical Association, was adopted. 

The constitution provides that membership in the Institute shall consist of 
Members, Fellows, Honorary Members, and Sustaining Members. A com- 
mittee on membership will establish qualifications requisite for the different 
grades of membership. The annual dues of members and fellows are five dollars 
a year and these include a yearns subscription to the official journal, the Annals 
of Mathematical Statistics. 

The next meeting of the Institute will be held in St. Louis, Missouri, in 
December of this year in connection with the meetings of the American Associa- 
tion for the Advancement of Science, the American Mathematical Society, and 
other organizations. 

Forms for application for membership in the Institute may be had by writing 
the Secretary-Treasurer at the University of Iowa, Iowa City, Iowa. 
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1935 DIRECTORY OF SUBSCRIBERS TO THE ANNALS OF 
MATHEMATICAL STATISTICS 

INDIVIDUALS 

Acerboni, Dr. Argentino V., Banfield, Larroque 232, U. T. 94, Argentine. 

Armstrong, Charles M., Jr., 1338 Dean Street, Schenectady, N. Y. 

Anderson, Robert, 1104 New Federal Building, St. Paul, Minn. 

Aroian, Leo. A., Colorado State College, Fort Collins, Colo. 

Bachelor, Robert W., 1437 Bancroft Way, Berkeley, Calif. 

Bailey, W. B., The Travelers Insurance Company, Hartford, Conn. 
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